Microcanonical: Continous Graphs

We already established that I cannot focus my attention for too long, or maybe even long enough, on any given topic. Let me just jot down here what ideas crossed my mind in one of those insomnia trips of May 2024, almost exactly two years ago -I wish I were better at writing things down.

A favorite topic of mine is the problem of unsupervised clustering, in particular the role of symmetry groups in clustering given a weighted graph of "affinities".

I archived this problem long ago as I wasn't making progress, among other things because my only rudimentary knowledge of Representation Theory. I only update it when I come across an idea that I find suggestive for possibly unstucking that work.

When doing so I, and in particular for this pet problem of mine, I proceed very loosely, considering ideas whatever flimsy connection they may have. I only care that they are "suggestive".

In this sense, I never pay much respect to the distinctions made in the literature between keywords as "classification", "clustering", "community finding", "partitioning", "grouping", etc., or even "tessellation", "phase separation", "fragmentation " or "pattern formation". Sure, their context and problem setting are different, but they also have a key shared pattern, namely, finding "heterogeneities" in what before was an "homogeneous" class of objects.

One such triggering idea was that classification is akin at finding (and computing!) a mapping that tells us whether two such objects satisfy the same set of properties; alternatively, the canonical mapping giving the "group" each of those objects "belong to" (see Avi Widgerson, Mathematics and Computation, 2017, page. 19).

Can we extend this to our case? Would it be the case that:

1.- Clustering $\iff$ a computation/program/function ?

2.- Clustering entails a model of computation?

For instance, we may consider that the partition $\bar 0$ of all _singletons_ would naturally map to the identity function. Similarly, the partition $\bar{\mathbb 1}$ would map to a constant function.

We may look at 1) from a more general point of view -something you could consider a common strategy when trying to solve a problem: a more general setting may sometimes be more amenable to analyzing and solving it. Case in point, in Mathematics and Physics, specially in the latter, very often one generalizes a discrete problem to a continuum one, e.g., translating discrete sums, $\sum_i\,f(i)$ into integrals $\int\,dx\,f(x)$.

How can we cluster a continuous set? In the discrete case we have a finite set of vertices $\mathbb V$ and a finite set of edges $E\,\subset \, \mathbb{V}\times\mathbb{V}$.

How could we define a continuous graph?

One possibility is to proceed in a similar way to the discrete case.

Be $\mathbb V\,=\,\left[0\,1\right]$ the continuous set of vertices, and $f\,:\,\mathbb{V}\,\to\,\mathbb{I}\,\subset\,\mathbb{V}$ an injective continuous function, then we can define a continuous directed graph as $\mathbb{G}\,=\,\left(\mathbb{V}\,,\,\mathbb{E}_f\right)$, where the continuous set of edges is $\mathbb{E}_f\subset\mathbb{V}\times\mathbb{I}$.

Consider, e.g., $f(x)=(2\,x\,+\,1)/4$, the graph $\mathbb{G}$ is described by the plot of $f$, which gives us its edge-set $\mathbb{E}_f$.

For a different edge-function, $g(x)=\frac{-3}{4(2x-3)}$, we get a different continuous graph with edge-set $\mathbb{E}_g$.

Any flow on these directed graphs end on an attractor vertex given by $x^*=1/2$ and $x^*=(3-\sqrt(3))/4$, for $f$ and $g$, respectively.

These two examples are continuous graphs determined by a regular function. But we can envision other continuous graphs that can't be defined by a regular function. Some sketches will make it clearer.

In general, we will have a continuous, weighted directed graph with an edge-manifold implicitly defined by $F(x,y(x),z(x,y))=0$. Basically, extrude those plots vertically along the axis perpendicular to the screen. Projecting any point of the edge-manifold $P(x,y,z)$ onto the plane $z=0$ determines whether two vertices $(x,y)$ are connected with an edge from $x$ to $y$. Alternatively, two vertices $x$ and $y$ are connected with an edge $(x,y)$ iff $F(x,y,z(x,y))=0$. The height above that plain determines the weight of the edge, $W(x,y)$.

These graph will contain both, continuous as well as discrete components:

$\mathbb{E}-\mbox{plot}\equiv\epsilon(x,y)=\sum_c\,W_c(x,y)\,+\,\sum_{x_0}\delta(x-x_0)\delta(y-y(x_0))$

where the first term sums over all continuous components and the second over the discrete set of edges with source $x_0$ and target $y_0\equiv y(x_0)$.

The associated adjacency graph could be defined as

$A-\mbox{plot}\equiv{\cal A}(x,y)=\sum_c\,\theta\big(\left\{W_c(x,y)\,+\,W_c(y,x)\right\}\big)$

where $\theta$ is the Heaviside step function and, for simplicity, we have ignored the discrete component and assumed all edge weights are non-negative. The absence of a weight is given by $W_c(x,y)=0$, for the lack of a better criterion.

Note: Writing these ideas down on the blog always takes more time than one expects. Unfortunately, I lost the original document where I wrote this down initially and now I have left only a pdf version. When exporting it to pdf, those first two plots fell between two pages. Thus, the black horizontal thick line.

Most likely this line of thought is a dead end. For instance, a curve $\gamma\subset f(\mathbb{V})\subset \mathbb{E}_f$ is a colection of edges that, in general, doesn't define say a flow, although in this case it does contain a flow $F\subset\gamma$.

Anyway, reconnecting with the initial idea, this lead to ask other questions that I had:

1. What does function composition tell us about clustering?

2. What does clustering tell us about function composition?

3. How can clustering define composition of functions?

EDIT: Of course, as it's usual with my insomnia triggered speculations, an idea of continuous graphs had already appeared in the literature before in the form of metric graphs, see Mugnolo's paper https://ar5iv.labs.arxiv.org/html/1912.07549 . The name continuous graph shows in a more recent paper referencing this one https://arxiv.org/html/2501.14554v1#bib.bib13 .

EDIT2 (29.04.2026): Graphons! I had totally missed this. It turns out the above continuous graphs I discuss are almost the same as the Continuous Graphs in the literature: Graphons, from "Graph"+"function". Instead of a "weight function" $W(x,y)\in\mathbb{R}$ for the edge $(x,y)$, they introduce a probabilistic description $W(U_x,U_y)\in [0,1]$ for there being an edge $(x,y)$. Here $U_{x,y}\in [0,1]$ are random latent variables associated to the source and target vertices, respectively.

Microcanonical

Sunday, April 26, 2026

Continous Graphs

No comments:

Post a Comment