View Source Time Series Plots in Tucan
# Comment this out for running from the path
# System.put_env("TUCAN_DEV", "true")
tucan_version =
case System.get_env("TUCAN_DEV") do
nil -> "~> 0.3.0"
_other -> [path: Path.expand("..", __DIR__)]
end
Mix.install([
{:tucan, tucan_version},
{:kino_vega_lite, "~> 0.1.10"}
])
alias VegaLite, as: Vl
Introduction
This tutorial is an adaptation to Tucan of Basic Time Series Plots in Vega-Lite by Jon E. Froehlich
We will explore creating visualizations of Seattle's daily maximum temperature showing how to use position, color, and size to create multi-dimensional plots. We will create temporal scatter plots, dot/strip plots, heatmaps, and bubble plots.
We will then show how to use Tucan's functionalities to calculate and plot average monthly weather data.
Our first visualization - scatter
Throughout this notebook we will use the
:weather
dataset that comes withTucan
.
We can start by using Tucan.scatter/4
to plot all dataset points. Let's encode time on the x-axis and temp_max
on the y-axis. By default scatter
expects two quantitative variables. We have to specify that the x-axis encodes time by explicitly setting the type
to temporal
through the :x
option.
We set the filled
option to true
in order to have filled points and enable the tooltip
in order to add some interactivity.
max_temp_scatter =
Tucan.scatter(:weather, "date", "temp_max", filled: true, x: [type: :temporal], tooltip: true)
|> Tucan.set_size(700, 400)
|> Tucan.set_title("Daily Max Temperatures in Seattle 2012 - 2015")
|> Tucan.Axes.set_y_title("Max Temperature")
Semantic grouping by color
We can also add weather as an additional encoding channel using color. To control the color palette, we set the color scale.
Since we want to control the visual values, we specify a range. We carefully choose a palette that has semantic meaning: yellow for sun, gray for fog, blue for rain, etc.
You can either pipe the previous plot through Tucan.color_by/3
or set directly the color_by
option to the Tucan.scatter/4
call.
color_palette = ["#aec7e8", "#c7c7c7", "#1f77b4", "#9467bd", "#e7ba52"]
Tucan.color_by(max_temp_scatter, "weather")
|> Tucan.Scale.set_color_scheme(color_palette)
Semantic grouping by size
We can also control the size of the points by a fourth variable. Let's use precipitation
for this. Notice that we also set the range of the size
encoding in order to ensure that all points are included in the graph.
Notice how the tooltip content changes with respect to the encoded parameters.
max_temp_scatter
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.size_by("precipitation", scale: [range: [5, 350]])
Aggregating around month
To track seasonal patterns, let's aggregate the temporal dimension around the monthdate
time unit, which is sensitive to month and date but not year—and can be useful for binning time values to look at seasonal patterns.
Let's also specifically format the x-axis date format to print out the abbreviated month name. See Vega-Lite's text format documentation for more details. We will use the Tucan.Axes.put_options/3
helper which can set any option on the given axis.
Also notice that the :y
option (like all encoding options) can be used to set any arbitrary option to the y
encoding channel.
aggregated_scatter =
Tucan.scatter(:weather, "date", "temp_max",
x: [type: :temporal, time_unit: :monthdate],
y: [aggregate: :mean],
width: 700,
height: 400
)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.size_by("precipitation", type: :quantitative, scale: [range: [5, 350]])
|> Tucan.set_title("Daily Max Temperatures in Seattle 2012 - 2015")
|> Tucan.Axes.set_y_title("Max Temperature")
|> Tucan.Axes.set_x_title("Aggregated Months (2012-2015)")
|> Tucan.Axes.put_options(:x, format: "%b")
Faceting the plot
We can use the Tucan.facet_by/4
function to split the plot into small multiples. Let's split the above graph by weather
in order to highlight weather-based temporal trends.
You can use various helper methods to modify things like the legend position or reset the dimensions of a plot or the axes titles.
aggregated_scatter
|> Tucan.facet_by(:column, "weather")
|> Tucan.set_size(130, 140)
|> Tucan.Axes.set_x_title("Month")
|> Tucan.Legend.set_orientation(:color, "bottom")
|> Tucan.Legend.set_orientation(:size, "bottom")
Combining two plots using concatenation
We can use Tucan
's concat methods to combine plots vertically or horizontally. This is a form of view composition. We can combine plots horizontally with Tucan.hconcat/2
, vertically with Tucan.vconcat/2
, or via a general wrappable Tucan.concat/2
.
Let's combine both a bar-based frequency plot with this temporal plot.
frequencies =
Tucan.countplot(:weather, "weather", orient: :vertical, width: 700)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.Axes.set_x_title("Num of Days with Weather (2012-2015)")
Tucan.vconcat([aggregated_scatter, frequencies])
Stripplot
A strip plot is another way to explore variations in weather over time. In this case, let's encode the date by month along the x-axis, the date by year on the y-axis, and weather via color.
strip =
Tucan.stripplot(:weather, "date",
group_by: "date",
x: [time_unit: :monthdate, type: :temporal],
y: [time_unit: :year, type: :ordinal],
width: 700,
fill_opacity: 1
)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.Axes.set_y_title("Year")
|> Tucan.Axes.set_x_title("Month")
|> Tucan.Axes.put_options(:x, format: "%b")
We could add in a fourth dimension by encoding the tick size as a function of the max_temp field that day.
strip
|> Tucan.size_by("temp_max")
|> Tucan.set_height(140)
You could also change the :mode
to :jitter
to plot jittered points instead of ticks.
Tucan.stripplot(:weather, "date",
group_by: "date",
style: :jitter,
x: [time_unit: :monthdate, type: :temporal],
y: [time_unit: :year, type: :ordinal],
fill_opacity: 1
)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.Axes.set_y_title("Year")
|> Tucan.Axes.set_x_title("Month")
|> Tucan.Axes.put_options(:x, format: "%b")
|> Tucan.set_size(700, 300)
Strip plot split by weather
Rather than encoding year on the y-axis
, let's encode the weather.
Tucan.stripplot(:weather, "date",
group_by: "weather",
x: [time_unit: :monthdate, type: :temporal],
width: 700,
fill_opacity: 1
)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.Axes.set_y_title("Weather")
|> Tucan.Axes.set_x_title("Month")
|> Tucan.Axes.put_options(:x, format: "%b")
Distribution plots
Tucan provides several plot types for displaying the distribution of a numerical variable. Let's start by plotting the boxplots of the temp_max
for the various weather
types.
Tucan.boxplot(:weather, "temp_max", group_by: "weather")
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.set_width(500)
We could also use Tucan.errorbar/3
.
Tucan.errorbar(:weather, "temp_max",
group_by: "weather",
points: true,
ticks: true,
extent: :ci
)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.set_width(500)
Similarly we could use Tucan.stripplot/3
this time encoding the temp_max
on the x-axis. Notice that we use uniform
jittering.
Tucan.stripplot(:weather, "temp_max", group_by: "weather", style: :jitter, jitter_mode: :uniform)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.set_width(500)
|> Tucan.set_height(300)
Weather heatmaps
A heatmap uses color to encode the magnitude of a value.
Let's use a heatmap to examine how Seattle's max temperature changes over the year. For this, we will encode the date by day along the x-axis, the date by month along the y-axis, and the average temp_max as color.
heatmap =
Tucan.heatmap(:weather, "date", "date", "temp_max",
x: [time_unit: :date],
y: [time_unit: :month],
color: [aggregate: :mean]
)
|> Tucan.set_title("Heatmap of Avg Max Temperatures in Seattle (2012-2015)")
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
|> Tucan.Legend.set_title(:color, "Avg Max Temp")
Changing the heatmap color scheme
Let's change the color scheme to something more semantic with high temperature values mapped to red and low temperature values mapped to blue. See the Vega-Lite color scheme documentation for more details on available schemes.
The only change to the previous plot is that we will change the :color
encoding scheme. We can use the Tucan.Scale.set_color_scheme/3
helper to easily set or change the color scheme of an existing plot. We use the reverse: true
option in order to make the low temperatures blue and the high temperatures red.
Tucan.Scale.set_color_scheme(heatmap, :redyellowblue, reverse: true)
Adding in heatmap cells the labels
You can add annotations by setting the :annotate
option to true
. This will add a :text
encoding to the plot which you can configure like all other encodings through the :text
option. You can also conditionally color the text using :text_color
Tucan.heatmap(:weather, "date", "date", "temp_max",
annotate: true,
x: [time_unit: :date],
y: [time_unit: :month],
color: [aggregate: :mean],
text: [format: ".1f"],
text_color: [{nil, 9, "white"}, {25, nil, "white"}]
)
|> Tucan.set_title("Heatmap of Avg Max Temperatures in Seattle (2012-2015)")
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
|> Tucan.Legend.set_title(:color, "Avg Max Temp")
|> Tucan.Scale.set_color_scheme(:redyellowblue, reverse: true)
|> Tucan.set_width(800)
Punchcard Plots
Similar to the heatmap
example, we can also make a punchcard
to explore temporal Seattle weather patterns.
We'll use roughly the same encodings as before but this time map the average temp_max to circle size.
Tucan.punchcard(:weather, "date", "date", "temp_max",
x: [time_unit: :date],
y: [time_unit: :month],
size: [aggregate: :mean]
)
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
|> Tucan.Legend.set_title(:size, "Avg Max Temp")
Add dual encoding of temp max
We could also set a dual encoding where both color and size encode the average temp_max field. Notice that since this is a layered plot we also need to set recursive: true
in the Tucan.color_by/3
call.
Tucan.punchcard(:weather, "date", "date", "temp_max",
x: [time_unit: :date],
y: [time_unit: :month],
size: [aggregate: :mean]
)
|> Tucan.color_by("temp_max", aggregate: :mean, recursive: true)
|> Tucan.Scale.set_color_scheme(:redyellowblue, reverse: true)
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
|> Tucan.Legend.set_title(:size, "Avg Max Temp")
Encode precipitation as mark size
Alternatively, we could encode precipitation as mark size.
Tucan.punchcard(:weather, "date", "date", "precipitation",
x: [time_unit: :date],
y: [time_unit: :month],
size: [aggregate: :mean]
)
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
Encode both temperature and precipitation
You can encode add both temperature and precipitation to the same punchcard plot by using both color and size encodings.
We are also using Tucan.Legend.set_orientation/3
helper to change the position of the two legends.
Tucan.punchcard(:weather, "date", "date", "precipitation",
x: [time_unit: :date],
y: [time_unit: :month],
size: [aggregate: :mean]
)
|> Tucan.color_by("temp_max", aggregate: :mean, type: :quantitative, recursive: true)
|> Tucan.Scale.set_color_scheme(:redyellowblue, reverse: true)
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
|> Tucan.Legend.set_orientation(:color, "top")
|> Tucan.Legend.set_orientation(:size, "top")
Lineplots - Plotting average monthly temperatures
While useful to graph the raw data above, aggregating our data along higher-order temporal dimensions like month or year might help highlight additional trends. Let's try graphing the average maximum temperature in Seattle by month from 2012-2015.
We can do this by combining Vega-Lite's :time_unit
and :aggregate
properties. We will specify that we want the x-axis in months and to aggregate using the mean.
We will use Tucan.lineplot/4
to plot the seasonal trend. By setting points: true
we include the points to the line plot.
avg_temperature =
Tucan.lineplot(:weather, "date", "temp_max",
x: [time_unit: :month, type: :temporal],
y: [aggregate: :mean],
points: true,
tooltip: true,
width: 700
)
|> Tucan.set_title("Average Daily Max Temperatures in Seattle (2012-2015) by Month")
Adding annotations for daily max average
We can add in an average line using Tucan.hruler/3
. We can add a line either on a specific y-axis point or on a calculated point.
Tucan.hruler(avg_temperature, "temp_max", line_color: "red")
Lines by weather type
Similarly to all other plots we can use color_by
to plot multiple lines for each weather type. Additionally we will use stroke_dash_by
to make the plot more accessible.
Tucan.lineplot(:weather, "date", "temp_max",
x: [time_unit: :month, type: :temporal],
y: [aggregate: :mean],
points: true,
tooltip: true
)
|> Tucan.color_by("weather")
|> Tucan.Scale.set_color_scheme(color_palette)
|> Tucan.stroke_dash_by("weather")
|> Tucan.set_title("Average Daily Max Temperatures in Seattle (2012-2015) by Month & Weather")
|> Tucan.set_size(700, 350)
Showing the confidence interval
We can use Tucan.errorband/4
to plot the confidence interval. We also color by the weather type in order to display the max temperatures intervals by weather type.
Tucan.errorband(:weather, "date", "temp_max", x: [time_unit: :month, type: :temporal])
|> Tucan.color_by("weather")
|> Tucan.Scale.set_color_scheme(color_palette)
|> Tucan.set_title("Daily Max Temperatures in Seattle (2012-2015) by Month & Weather")
|> Tucan.set_size(700, 350)
Usually you want to overlay the confidence intervals with the trend lines. You can achieve this by using the Tucan.layers/2
helper.
errorbands =
Tucan.errorband(:weather, "date", "temp_max", x: [time_unit: :month, type: :temporal])
trend_lines =
Tucan.lineplot(:weather, "date", "temp_max",
x: [time_unit: :month, type: :temporal],
y: [aggregate: :mean],
points: true,
tooltip: true
)
Tucan.layers([errorbands, trend_lines])
|> Tucan.set_title("Daily Max Temperatures in Seattle (2012-2015) by Month & Weather")
|> Tucan.set_size(700, 350)
|> Tucan.color_by("weather", recursive: true)
|> Tucan.Scale.set_color_scheme(color_palette)
Composite Plots
Tucan.jointplot(:weather, "temp_max", "precipitation",
color_by: "weather",
ratio: 0.3,
fill_opacity: 0.5
)
Tucan.pairplot(:weather, ["temp_min", "temp_max", "precipitation", "wind"], diagonal: :histogram)
|> Tucan.color_by("weather", recursive: true)
|> Tucan.Legend.set_orientation(:color, "top")
# |> Tucan.Scale.set_color_scheme(color_palette)