Blended Representative Periods with Tulipa Clustering

Introduction

Using representative periods is a simplification method to reduce the size of the problem. Instead of solving for every time period, the model solves for a few chosen representatives of the data. The original data is then reconstructed or approximated by blending the representatives.

Tulipa uses the package TulipaClustering.jl to choose representatives and cluster input data.

Set up the environment

Add the new packages:

using Pkg: Pkg
Pkg.activate(".")
Pkg.add("TulipaClustering")
Pkg.add("Distances")

Import packages:

import TulipaIO as TIO
import TulipaEnergyModel as TEM
import TulipaClustering as TC
using DuckDB
using DataFrames
using Plots
using Distances

Question: Do you remember how to install the two new libraries into your environment?

Set up the workflow

The data for this tutorial can be found in the folder my-awesome-energy-system/tutorial-4

Load the data:

connection = DBInterface.connect(DuckDB.DB)
input_dir = joinpath(@__DIR__, "my-awesome-energy-system/tutorial-4")
TIO.read_csv_folder(connection, input_dir)
DuckDB.DB(":memory:")
Tip

Remember that you can always define and create the output directory if it doesn't exist to export the results to csv files. Then you can use the output_folder keyword argument in the run_scenario function to save the results in that folder.

Try to run the problem as usual:

TEM.populate_with_defaults!(connection)
energy_problem = TEM.run_scenario(connection)

Uh oh! It doesn't work. Why not?

ERROR: DataValidationException: The following issues were found in the data:
- Column 'milestone_year' of table 'rep_periods_data' does not have a default

In previous tutorials, the representative periods tables were available in the input directories asumming you had only one representative period for the whole year, e.g., have a look at the tables profiles_rep_periods, rep_periods_data, rep_periods_mapping, and timeframe_data in tutorials 1 to 3.

Now, we will learn how to generate these tables using TulipaClustering! 😉

Using TulipaClustering

Explore the Profiles Data

Let's first take a look at the profiles data we have by looking at the file profiles-wide.csv in the input directory. You can also use TulipaIO to read the table and see its contents by displaying the first 12 rows of the table:

profiles_wide_df = TIO.get_table(connection, "profiles_wide")
first(profiles_wide_df, 12)
12×5 DataFrame
Rowmilestone_yeartimestepavailability-windavailability-solardemand-demand
Int64Int64Float64Float64Float64
1203010.9757440.00.633142
2203020.9744820.00.625239
3203030.9748120.00.62111
4203040.977630.00.619755
5203050.9762980.00.620001
6203060.9769180.00.627268
7203070.9769790.00.667478
8203080.9729720.00.696503
9203090.9730270.00.711401
102030100.9766690.00.729661
112030110.9772410.09759990.736751
122030120.9780.1640520.739043

The wide is very convenient for humans to read, but not so much for computers to process. We need to transform it into a long format first.

Transforming the Format from Wide to Long

TulipaClustering provides a function transform_wide_to_long! that does exactly that:

TC.transform_wide_to_long!(
    connection,
    "profiles_wide",
    "profiles";
    exclude_columns = ["milestone_year", "timestep"]
)

Let's have a look at the new table at the first 5 rows and the last 5 rows:

profiles_df = TIO.get_table(connection, "profiles")
first(profiles_df, 5)
5×4 DataFrame
Rowmilestone_yeartimestepprofile_namevalue
Int64Int64StringFloat64
120301availability-solar0.0
220302availability-solar0.0
320303availability-solar0.0
420304availability-solar0.0
520305availability-solar0.0
last(profiles_df, 5)
5×4 DataFrame
Rowmilestone_yeartimestepprofile_namevalue
Int64Int64StringFloat64
120308756demand-demand0.916969
220308757demand-demand0.879445
320308758demand-demand0.840501
420308759demand-demand0.805736
520308760demand-demand0.774246

The new table stacks the columns with profile data into one column called value and adds a new column called profile_name that contains the names of the profiles. Each row now corresponds to one timestep of one profile per year.

Clustering the Profiles

We can perform the clustering by using the cluster! function from TulipaClustering by passing the connection with the profiles table and two extra arguments:

  • period_duration: How long are the periods (e.g., 24 for daily periods if the timestep is hourly);
  • num_rps: How many representative periods.

Let's first use the function cluster! with its default parameters. Have a look at the default definition if you're curious.

period_duration = 24
num_rps = 4
# A layout is required for the clustering to know which name is used for year
layout = TC.ProfilesTableLayout(year = :milestone_year)
clusters = TC.cluster!(connection, period_duration, num_rps; layout = layout)
Dict{DataFrames.GroupKey{DataFrames.GroupedDataFrame{DataFrames.DataFrame}}, TulipaClustering.ClusteringResult} with 1 entry:
  GroupKey: (milestone_year = 2030,) => ClusteringResult(288×5 DataFrame…

Explore the results by looking at the new tables created, e.g., profiles_rep_periods, rep_periods_data, rep_periods_mapping, and timeframe_data. Remember that you can use TIO.get_table(connection, "table_name") to explore the tables.

Now, let's have a look at the clustering results by plotting the representative periods:

df = TIO.get_table(connection, "profiles_rep_periods")
rep_periods = unique(df.rep_period)
plots = []

for rp in rep_periods
    df_rp = filter(row -> row.rep_period == rp, df)
    p = plot(size=(400, 300), title="Representative Period $rp")

    for group in groupby(df_rp, :profile_name)
        name = group.profile_name[1]
        plot!(p, group.timestep, group.value, label=name)
    end

    show_legend = (rp == rep_periods[1])
    plot!(p,
          xlabel="Timestep",
          ylabel="Value",
          xticks=0:2:period_duration,
          xlim=(1, period_duration),
          ylim=(0, 1),
          legend=show_legend ? :bottomleft : false,
          legendfontsize=6
         )
    push!(plots, p)
end

plot(plots..., layout=(2, 2), size=(800, 600))
Example block output

Nice! But, do you know we can do better than this? Yes, we can! Let's explore a more advanced clustering method.

Hull Clustering with Blended Representative Periods

The function cluster! has several keyword arguments that can be used to customize the clustering process. Alternatively, you can use the help mode in Julia REPL by typing ?cluster! to see all the available keyword arguments and their descriptions. Here is a summary of the most important keyword arguments for this example:

  • method (default :k_medoids): clustering method to use :k_means, :k_medoids, :convex_hull, :convex_hull_with_null, or :conical_hull.
  • distance (default Distances.Euclidean()): semimetric used to measure distance between data points from the the package Distances.jl.
  • weight_type (default :dirac): the type of weights to find; possible values are:
    • :dirac: each period is represented by exactly one representative period (a one unit weight and the rest are zeros)
    • :convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one)
    • :conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights)
    • :conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.

As you can see, there are several keyword arguments that can be combined to explore different clustering strategies. Our proposed method is the Hull Clustering with Blended Representative Periods, see the references in the TulipaClustering package. To activate this method you need to set up the following keyword arguments:

  • method = :convex_hull
  • distance = Distances.CosineDist()
  • weight_type = :convex

So, let's cluster again using the proposed method:

using Distances
clusters = TC.cluster!(connection,
                    period_duration,
                    num_rps;
                    method = :convex_hull,
                    distance = Distances.CosineDist(),
                    weight_type = :convex,
                    layout = layout,
                    )

# Let's have a look at the first 10 rows of the new rep_periods_mapping table
first(TIO.get_table(connection, "rep_periods_mapping"), 10)
10×4 DataFrame
Rowmilestone_yearperiodrep_periodweight
Int64Int64Int64Float64
12030110.0129126
22030120.902497
32030130.0845909
42030210.0300856
52030220.885887
62030230.0316205
72030240.0524069
82030320.875874
92030330.124126
102030420.175697

What do you notice about the new representative periods mapping?

Let's plot again the resulting representative periods, but this time using the clustered profiles with the hull clustering method:

df = TIO.get_table(connection, "profiles_rep_periods")
rep_periods = unique(df.rep_period)
plots = []

for rp in rep_periods
    df_rp = filter(row -> row.rep_period == rp, df)
    p = plot(size=(400, 300), title="Hull Clustering RP $rp")

    for group in groupby(df_rp, :profile_name)
        name = group.profile_name[1]
        plot!(p, group.timestep, group.value, label=name)
    end

    show_legend = (rp == rep_periods[1])
    plot!(p,
          xlabel="Timestep",
          ylabel="Value",
          xticks=0:2:period_duration,
          xlim=(1, period_duration),
          ylim=(0, 1),
          legend=show_legend ? :topleft : false,
          legendfontsize=6
         )
    push!(plots, p)
end

plot(plots..., layout=(2, 2), size=(800, 600))
Example block output

The first difference you may notice is that the representative periods (RPs) obtained with hull clustering are more extreme than those obtained with the default method. This is because hull clustering selects RPs that are more likely to be constraint-binding in an optimization model.

The Projected Gradient Descent Parameters

The parameters niters and learning_rate tell for how many iterations to run the descent and by how much to adjust the weights in each iterations. More iterations make the method slower but produce better results. Larger learning rate makes the method converge faster but in a less stable manner (i.e., weights might start going up and down a lot from iteration to iteration). Sometimes you need to find the right balance for yourself. In general, if the weights produced by the method look strange, try decreasing the learning rate and/or increasing the number of iterations.

Running the Model

To run the model, add the data to the system with TulipaIO and then run it as usual:

TEM.populate_with_defaults!(connection)
energy_problem = TEM.run_scenario(connection);
EnergyProblem:
  - Model created!
    - Number of variables: 1056
    - Number of constraints for variable bounds: 960
    - Number of structural constraints: 1344
  - Model solved!
    - Termination status: OPTIMAL
    - Objective value: 1.6915474820088142e8
    - Objective breakdown:
      - assets_fixed_cost_aggregated_vintage_method: 0.0
      - assets_fixed_cost_compact_vintage_method: 0.0
      - assets_investment_cost: 0.0
      - flows_fixed_cost: 0.0
      - flows_investment_cost: 0.0
      - flows_operational_cost: 1.6915474820088142e8
      - storage_assets_energy_fixed_cost: 0.0
      - storage_assets_energy_investment_cost: 0.0
      - units_on_operational_cost: 0.0
      - vintage_flows_operational_cost: 0.0

Interpreting the Results

To plot the results, first read the data with TulipaIO and filter what's needed (and rename time_block_start to timestep while you're at it):

flows = TIO.get_table(connection, "var_flow")

select!(
    flows,
    :from_asset,
    :to_asset,
    :milestone_year,
    :rep_period,
    :time_block_start => :timestep,
    :solution
)

from_asset = "ccgt"
to_asset = "e_demand"
year = 2030

filtered_flow = filter(
    row ->
        row.from_asset == from_asset &&
            row.to_asset == to_asset &&
            row.milestone_year == year,
    flows,
)
# display the first 10 rows of the filtered flow
first(filtered_flow, 10)
10×6 DataFrame
Rowfrom_assetto_assetmilestone_yearrep_periodtimestepsolution
StringStringInt32Int32Int32Float64
1ccgte_demand203011615.851
2ccgte_demand203012616.034
3ccgte_demand203013623.815
4ccgte_demand203014643.319
5ccgte_demand203015689.316
6ccgte_demand203016707.398
7ccgte_demand203017745.526
8ccgte_demand203018714.807
9ccgte_demand203019610.763
10ccgte_demand2030110510.414

To reinterpret the RP data as base periods data, first create a new dataframe that contains both by using the inner join operation:

rep_periods_mapping = TIO.get_table(connection, "rep_periods_mapping")
df = innerjoin(filtered_flow, rep_periods_mapping, on=[:milestone_year, :rep_period])
# display the first 10 rows of the new dataframe
first(df, 10)
10×9 DataFrame
Rowfrom_assetto_assetmilestone_yearrep_periodtimestepsolutionperiodweightscenario
StringStringInt32Int32Int32Float64Int32Float64Int32
1ccgte_demand203011615.85110.01291261
2ccgte_demand203012616.03410.01291261
3ccgte_demand203013623.81510.01291261
4ccgte_demand203014643.31910.01291261
5ccgte_demand203015689.31610.01291261
6ccgte_demand203016707.39810.01291261
7ccgte_demand203017745.52610.01291261
8ccgte_demand203018714.80710.01291261
9ccgte_demand203019610.76310.01291261
10ccgte_demand2030110510.41410.01291261

Next, use Julia's Split-Apply-Combine approach to group the dataframe into smaller ones. Each grouped dataframe contains a single data point for one base period and all RPs it maps to. Then multiply the results by weights and add them up.

gdf = groupby(df, [:from_asset, :to_asset, :milestone_year, :period, :timestep])
result_df = combine(gdf, [:weight, :solution] => ((w, s) -> sum(w .* s)) => :solution)
# display the first 10 rows of the result dataframe
first(result_df, 10)
10×6 DataFrame
Rowfrom_assetto_assetmilestone_yearperiodtimestepsolution
StringStringInt32Int32Int32Float64
1ccgte_demand20301120.5215
2ccgte_demand20301220.9858
3ccgte_demand20301321.5406
4ccgte_demand20301422.9244
5ccgte_demand20301527.4877
6ccgte_demand20301629.1971
7ccgte_demand20301736.9363
8ccgte_demand20301848.6965
9ccgte_demand20301951.298
10ccgte_demand203011057.7317

Now you can plot the results. Remove the period data since you don't need it anymore, and re-sort the data to make sure it is in the right order.

TC.combine_periods!(result_df)
sort!(result_df, :timestep)

plot(
    result_df.timestep,
    result_df.solution;
    label=string(from_asset, " -> ", to_asset),
    xlabel="Hour",
    ylabel="[MWh]",
    marker=:circle,
    markersize=2,
    xlims=(1, 168),
)
Example block output

This concludes this tutorial! Play around with different parameters to see how the results change. For example, when you use :dirac vs :convex weights, do you see the difference? How does the solution change as you increase the number of RPs?

Working with the New Tables Created by TulipaClustering

You can check the new tables with TulipaIO, for example, let's have a look at the first 10 rows of the new rep_periods_mapping table:

first(TIO.get_table(connection,"rep_periods_mapping"), 10)
10×5 DataFrame
Rowmilestone_yearperiodrep_periodweightscenario
Int32Int32Int32Float64Int32
12030110.01291261
22030120.9024971
32030130.08459091
42030210.03008561
52030220.8858871
62030230.03162051
72030240.05240691
82030320.8758741
92030330.1241261
102030420.1756971

If you want to save the intermediary tables created by the clustering, you can do this with DuckDB:

DuckDB.execute(
    connection,
    "COPY 'profiles_rep_periods' TO 'profiles-rep-periods.csv' (HEADER, DELIMITER ',')",
)
(Count = [288],)

Remember, the new tables out from the TulipaClustering are:

  • profiles_rep_periods
  • rep_periods_data
  • rep_periods_mapping
  • timeframe_data

This is useful when you don't have to rerun the clustering every time.

Tip

If you want to create the tables programmatically using only one dummy representative period for the whole year, you can use the dummy_cluster! function from the TulipaClustering package. We will use this function in the next tutorial to create a benchmark with hourly data.