Sunday, April 26, 2026

Continous Graphs

  We already established that I cannot focus my attention for too long, or maybe even long enough, on any given topic. Let me just jot down here what ideas crossed my mind in one of those trips.

A favorite topic of mine is the problem of unsupervised clustering, in particular the role of symmetry groups in clustering given a weighted graph of  "affinities". 

I archived this problem long ago as I wasn't making progress, among other things because my only rudimentary knowledge of Representation Theory. I only update it when I come across an idea that I find suggestive for possibly unstucking that work.

When doing so I, and in particular for this pet problem of mine, I proceed very loosely, considering ideas whatever flimsy connection they may have. I only care that they are "suggestive". 

In this sense, I never pay much respect to the distinctions made in the literature between keywords as "classification", "clustering", "community finding", "partitioning", "grouping", etc.,  or even "tessellation", "phase separation", "fragmentation " or "pattern formation". Sure, their context and problem setting are different, but they also have a key shared pattern, namely, finding "heterogeneities" in what before was an "homogeneous" class of objects.

One such triggering idea was that classification is akin at finding (and computing!) a mapping that tells us whether two such objects satisfy the same set of properties; alternatively, the canonical mapping giving the "group" each of those objects "belong to" (see Avi Widgerson, Mathematics and Computation, 2017, page. 19).

Can we extend this to our case?  Would it be the case that:

1.- Clustering  $\iff$ a computation/program/function ?

2.- Clustering entails a model of computation?

For instance, we may consider that the partition $\bar 0$ of all _singletons_ would naturally map to the identity function. Similarly, the partition $\bar{\mathbb 1}$ would map to a constant function.

We may look at 1) from a more general point of view -something you could consider a common strategy when trying to solve a problem: a more general setting may sometimes be more amenable to analyzing and solving it. Case in point, in Mathematics and Physics, specially in the latter, very often one generalizes a discrete problem to a continuum one, e.g., translating discrete sums, $\sum_i\,f(i)$ into integrals $\int\,dx\,f(x)$.

How can we cluster a continuous set? In the discrete case we have a finite set of vertices $\mathbb V$ and a finite set of edges $E\,\subset \, \mathbb{V}\times\mathbb{V}$.

How could we define a continuous graph?

One possibility is to proceed in a similar way to the discrete case. 

Be $\mathbb V\,=\,\left[0\,1\right]$ the continuous set of vertices, and $f\,:\,\mathbb{V}\,\to\,\mathbb{I}\,\subset\,\mathbb{V}$ an injective continuous function, then we can define a continuous directed graph as $\mathbb{G}\,=\,\left(\mathbb{V}\,,\,\mathbb{E}_f\right)$, where the continuous set of edges is $\mathbb{E}_f=\mathbb{V}\times\mathbb{I}$.

Consider, e.g., $f(x)=(2\,x\,+\,1)/4$, the graph $\mathbb{G}$ is described by the plot of $f$, which gives us its edge-set $\mathbb{E}_f$.

For a different edge-function, $g(x)=\frac{-3}{4(2x-3)}$, we get a different continuous graph with edge-set $\mathbb{E}_g$.

Any flow on these directed graphs end on an attractor vertex given by $x^*=1/2$ and $x^*=(3-\sqrt(3))/4$, for $f$ and $g$, respectively.

These two examples are continuous graphs determined by a regular function. But we can envision other continuous graphs that can't be defined by a regular function. Some sketches will make it clearer.





In general, we will have a continuous, weighted directed graph with an edge-manifold implicitly defined by $F(x,y(x),z(x,y))$. Basically, extrude those plots vertically along the axis perpendicular to the screen. Any point $P(x,y,0)$ on the plane $z=0$ defines whether two vertices $(x,y)$ are connected with an edge from $x$ to $y$. The height above that plain determines the weight of the edge, $W(x,y)$.

These graph will contain both, continuous as well as discrete components:
$\mathbb{E}-\mbox{plot}\equiv\epsilon(x,y)=\sum_c\,W_c(x,y)\,+\,\sum_{x_0}\delta(x-x_0)\delta(y-y(x_0))$
where the first term sums over all continuous components and the second over the discrete set of edges with source $x_0$ and target $y_0\equiv y(x_0)$.

The associated adjacency graph could be defined as
$A-\mbox{plot}\equiv{\cal A}(x,y)=\sum_c\,\theta\big(\left\{W_c(x,y)\,+\,W_c(y,x)\right\}\big)$
where $\theta$ is the Heaviside step function and, for simplicity, we have ignored the discrete component and assumed all edge weights are non-negative. The absence of a weight is given by $W_c(x,y)=0$, for the lack of a better criterion.

Note: Writing these ideas down on the blog always takes more time than one expects. Unfortunately, I lost the original document where I wrote this down initially and now I have left only a pdf version. When exporting it to pdf, those first two plots fell between two pages. Thus, the black horizontal thick line.

Questions that I had:
1. What does function composition tell us about clustering?
2. What does clustering tell us about function composition?
3. How can clustering define composition of functions?

Friday, October 10, 2025

On the Impact of AI/LLMs in Mathematics and the Role of Embeddings

 For a view on LLMs from the point of view of Mathematical research watch Alex Kontorovich's recent talk at Harvard CMSA, The Shape of Math to Come [2].

If you want to skip the beginning I may suggest jumping to min 33 [1]. But he does make a reference to earlier slides when describing what speedup he's looking for when it comes to using LLMs to generate new mathematical results.

Tl;dr: He envisions a future of math hinging on two components: first formalization in a language like Lean or similar (Agda, Coq,...) and second an AI component atop of the former. In this case he only has in mind a generative one, namely an LLM (compare this to Lecunn's Dino talk [3]). At min33 he starts conveying the main idea of where he wants to get: what  matters is the idea we want to express, not its formalization, except that we need a system or process ensuring the latter actually conveys the former correctly. Even if an LLM system like AlphaProof could generate new mathematical results (papers) that are 99% of the cases fully correct and true, it is not useful for advancing Mathematics because of that probabilistic nature of its correctness. But LLMs could help in the initial step of translating ideas into Lean and then finding a proof, which, by the typed nature of Lean, if it "compiles", it is correct. Therein could lie the real speed up in Mathematical research, where researches would be able to spend more time in shaping their ideas, instead of having to dedicate quite a significant effort in proofreading their own derivations.

I wonder if one could not draw a direct analogy between the role of that formalization step in Lean and the universal underlying representation, in AI parlance, embedding step. In the following text I will be using interchangeably embedding, representation, description and up to some extend formalization too.

Here, formalization is meant as a graded concept: the same math result expressed in Lean would be 100% formalized, while in Natural language in would be anywhere between 0% and say 1% -the point being it's upper bounded. 

In this sense is that I would say describing some objects as vectors using an array of integers is in itself a formalization level, at the same time as an embedding. But you could be describing your vectors in a synthetic way, as geometrical objects -the arrows. Or you might be able to dispense altogether of the idea of vectors and describe your problem in a purely diagrammatic way. Mind you, all these descriptions belong to a well formalized area of Mathematics (Linear Algebra, Synthetic Geometry or say Clifford Algebra or Geometric Algebra, and (many) diagrammatic descriptions can be put within Category Theory. 

Each description/representation of your world will have different computational advantages. Hence, using different formal math tools (say from different math fields) could make a huge improvement in the results or usefulness of any given ML/AI tool. Maybe these AI tools have each different representations (descriptions, embeddings, formalizations) where their use is optimal. 

So,  while say the area of Reinforced Learning (RL) is currently stuck, that could change if we find a better description suitable for it. The same would apply in a practical problem you may be stuck trying to use some ML/DL tool to solve it, but changing your world description may lead you efficiently to its solution. 

The latter is in fact a "universal" trick not only in Math or Physics: change the way you look at things and they could go from being a roadblock to boosting you forward.


Anyway, it's highly possible that all this is but an incoherent rambling under the influence of a little bug I'm currently evicting. 


Note: In Kontorovich video, image and sound are out of sync by far; however, sound and subtitles are in sync.

[1] min33: https://youtu.be/xIG1RI44Nog?t=2002[2] full: https://www.youtube.com/watch?v=xIG1RI44Nog

[3] Lecunn's talk on Dino3: https://www.youtube.com/watch?v=yUmDRxV0krg 


PS: Talking about different descriptions, Youtube just fed to me this other recent video on "Towards a Geometric Theory of Deap Learning" by Govind Menon. Maybe worth having a closer look another time.


PS2: The Internet gods pointed me to another video that would perfectly match as a second part of Kontorovich's one: "Where is mathematics going (September 24, 2025)" by Kevin Buzzard, sponsored by the Simons foundation.

Monday, October 6, 2025

On "The Bitter Lesson" by Richard Sutton

Just found an interview with Richard Sutton referencing his "Bitter Lesson" post from 2019. See link at the bottom.

He claims to be making two points. I think his first one is a useful view for considering what AI topic might be more fruitful in the long run. Here long run is meant as on a personal level, not on a long run of Science. Read further along.


The successful approaches would be based fundamentally only on "search" and "learning". These are what often are critized as "simple brute-force approaches". 


This approach has been the one leading to the current wide-spread adoption and surprising effectiveness of RL -exemplified by solutions like AlphaGo, AlphaZero-, DeepLearning and LLMs.


And this success should be contrasted to the comparatively failed attempts at implementing techniques bearing what our own understanding of how we solve the same tasks or how we learn is.


In summary, trying to come up with techniques based on our previous world models fared way worse than techniques based on simple, general brute-force methods.


As he mentions in that interview, these claims are meant but as a reflection of what has or not worked out in the past 70 or so years. He makes no claim this will be the case in the future.


His second claim consists of his very last paragraph. However, I'm not certain if his point here isn't just a rephrasing of the previous one. 


Neither am I convinced he is consistent: One might argue that what he thinks of "search & learning " are but idealizations, i.e., models of methods created  by our minds. In particular, I think that what he considers as "learning" is simply RL, reinforced learning. 


But this method is nothing but human knowledge that we would be putting into the AI, which is the approach he seems to be arguing against in this second point.


His only argument around this seems to be that he considers these two processes as simple enough for serving as guiding principles. They do fit with the view of his first point though.


References

https://en.wikipedia.org/wiki/Richard_S._Sutton 

http://www.incompleteideas.net/IncIdeas/BitterLesson.html


Tuesday, August 26, 2025

The "homotopic"/Geometric Nature of Mathematical Induction

 Mathematical Induction for the Natural Numbers ${\mathbb N}$ can be expressed as the following Inference rule within Martin-Löf's Dependent Type Theory:

$\Gamma\,,\,x:{\mathbb N}\,\vdash\,P(x)\;Type$

$\Gamma \,\vdash\,p_0\,:\,P(0_{\mathbb N})$

$\Gamma \,\vdash\,p_s\,:\,\Pi_{(n:{\mathbb N})}\,P(n)\,\to\,P(succ_{\mathbb N}(n))$

 $-----------------\;{\mathbb N} ind$ 

$\Gamma \,\vdash\, ind_{\mathbb N}(p_0\,p_s) \,:\, \Pi_{(n:{\mathbb N})}\,P(n)$

with the computation rules -where we implicitly assume the same premises:

$ind_{\mathbb N}(p_0\,,\,p_s\,,\,0_{\mathbb N})\,\doteq\,p_0\,:\,P(0_{\mathbb N})$

$ind_{\mathbb N}(p_0\,,\,p_s\,,\, succ(n) ) \,\doteq\, p_s(n\, ,\, ind_{\mathbb N}(p_0\,,\,p_s\,,\, n )) \,:\, P(succ(n))$

Together this can be expressed graphically as  


 Thus, induction provides a way to build a function for free on an Inductive Type: If we have a way to hop from one fiber to "its neighboring" fiber, then, given a point at a starting fiber, we can built a section in the total space of that Type.  

That way of "hopping" between fibers is nothing else than the inductive step! 

First analogy, a curve can be defined by a differential equation and an initial condition. This equation can be given by a vector field that expresses the tangent to the curve at each point.

Also, in Differential Geometry we have the Covariant derivative, which expresses how we can move a tensor between infinitesimally close points in order to compare it meaningfully at those two points. Imposing the "Covariant change" to be zero along a given curve gives us a particular way to move a tensor between infinitesimally close points such that it stays parallel. This is the parallel transport of a tensor.

We can do the following: parallel transport a vector field along the very same curve it defines.  This curve is nothing other than a geodesic.

I'm definitely tempted to draw here an analogy between these Differential Geometry problems and the Induction rule, I'm just not sure what the best analogy might be. As induction gives as a curve, is Induction akin to building a geodesic through a point?

More general an analogy is the relation between induction an Sheaf theory. Paraphrasing an LLM, "both guarantee existence of global objects from local rules, though the technical frameworks differ: induction is sequential and algebraic; sheaf theory is topological and categorical".

Analogies aside, Dependent Type Theory through the lens of Homotopy Type Theory is very rich in intuitive pictures. I find this super cool!

Thursday, August 21, 2025

The Curry-Howard Correspondence

 Logic, from the point of view of Dependent Type Theory (DTT), hinges on the Curry-Howard interpretation/correspondence:

Any proposition/predicate is a Dependent Type (DT)

Take the following statement about the Natural numbers $\mathbb N$: $d\,\vert\,n\;\equiv$ "d divides n"

As we know from classical Number Theory, this is equivalent to saying (classical logic) $\exists\, k\,\in\,{\mathbb N}\;\vert\quad d\cdot k\,=\,n$.

Hence, the correspondence is 

$d\, \vert\, n\quad\leftrightarrow\quad\Sigma_{(k:\mathbb N)}d\cdot k\;=\;n$

On the LHS we have classical Logic/Number Theory; on the RHS we have Dependent Type Theory. Thus, in DTT the existential quantifier gets translated into a dependent sum.


But what is that equality on the rhs? Ans.: $IdentPath_{\mathbb N}(d\cdot k\,,\,n)$, the Path type defined on the Natural numbers identifying both values.


Thus, any value $p$ of that sum type, $p\,:\,\Sigma_{k:\mathbb N}d\cdot k = n$, is a point $\left(k\, ,IdentPath_{\mathbb N}(d\cdot k\, , n)\right)$ that provides a proof of the divisibility of $n$ by $d$.

Let's see how we would prove a statement like "any natural numbers $x$ is divisible by 1". The universal quantifier from classical logic involved here gets translated into the Product type:

$\Pi_{x:\mathbb N }\,1\vert x\quad\equiv\quad\Pi_{x:\mathbb N}\Sigma_{k:\mathbb N}\,1\cdot k = x$

In oder to prove this we need to find/construct a "witness" or element $p\,:\,\Pi_{x:\mathbb N}1\vert x$. Incidentally, this is called a section on the space $\Sigma_{x:\mathbb N}1\vert x$.

Decomposing $p$ by applying it to a number $x$ we see that it must be $p(x):\Sigma_{k:\mathbb N}1\cdot k = x$. Again, this is of the form $(k, q(x))$, where $q(x):\,1\cdot k\,=\,x$.

Here comes where would use the fact that in a DTT formulation of the Natural Numbers we would previously have constructed a path like $r(x)\,:\,1\cdot x\,=\,x$, which corresponds to the rule of left multiplication by the neutral element of the product in the Natural numbers. 

Hence, we have the element $p(x) = \left(x\, ,r(x)\right)$ which is of the desired type $\Sigma_{k:\mathbb N}1\cdot k\,=\,x$. As $x$ is an arbitrary number,  $p$ is the sought for function $p\,:\,\Pi_{x:\mathbb N}\,1\vert x$.  

We have thus proved in Dependent Type Theory that 1 divides any natural number.


I thought these two examples could show the gist of how to think in Dependent Type Theory about logical problems and proofs. 


Andrej Bauer's online course slides are great, but one of the books he recommends for further reading I'm starting to like quite a lot:   "Introduction to Homotopy Type Theory" by Egbert Rijke, which is available for free at https://arxiv.org/pdf/2212.11082. These examples are from there.


Sunday, July 27, 2025

"Holography" in Euclidean Geometry

Let's make one last summary of all below.

Since I learned on my own to address problems in Euclidean geometry from a purely synthetic approach, I've been fascinated by it. If only I had learned this on my teenage years...

Yet this time the passion for a visual approach seems to have mislead me down a windy rabbit hole just come out at what seems trivial had I chosen a more analytical approach of counting degrees of freedom.

We need 6 numbers to specify the coordinates of three points in the plane. If it's a right triangle, only five coordinates are independent. 

But we don't have to stick to cartesian coordinates. 

If we fix one point on the origin, we need four number for the two remaining points plus two more for defining the translation that sat this triangle on the origin.

We can repeat that thought process and further constrain a second point to be on the vertical axis. We can bring it there by a rotation, after using a translation to fix the first point. This leaves then three coordinates for one point "and a half", plus a number for specifying the rotation we had to apply, and two more numbers for the translation. For a right triangle, the third point is constrained as well, in this case, to be on the horizontal axis. Again it amounts to 6 and 5 dof, respectively.

If we fix the length of the "vertical" side to 1, we reduce the previous count of coordinates by one, which we use to specify a dilation. Fir right triangles then we have one horizontal coordinate, one scaling factor, one rotation, and the two numbers for a translation.

And this is nothing more what all the rambling below amounts to.

So, yeah, we have reduced a right triangle to a single point on a line; the whole 2d figure "emerges" from that single point. But the number of degrees of freedom have not changed, we simply hid them in the form of transformations.

I think one take home lesson is to always explore a problem from multiple perspectives, e.g, synthetic & analytic , very soon at the start. Some paths offer much less resistance for making solid steps forward.

--

Saturday, April 19, 2025

On the foolishness of NL programming

 The title of this post is but that of an article by the late E.W. Dijkstra that is still fully valid in these days of frantic work on LLMs:

The virtue of formal texts is that their manipulations, in order to be legitimate, need to satisfy only a few simple rules; they are, when you come to think of it, an amazingly effective tool for ruling out all sorts of nonsense that, when we use our native tongues, are almost impossible to avoid.

Instead of regarding the obligation to use formal symbols as a burden, we should regard the convenience of using them as a privilege: thanks to them, school children can learn to do what in earlier days only genius could achieve. When all is said and told, the "naturalness" with which we use our native tongues boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.

You can access it here.