Microcanonical

Friday, October 10, 2025

On the Impact of AI/LLMs in Mathematics and the Role of Embeddings

For a view on LLMs from the point of view of Mathematical research watch Alex Kontorovich's recent talk at Harvard CMSA, The Shape of Math to Come [2].

If you want to skip the beginning I may suggest jumping to min 33 [1]. But he does make a reference to earlier slides when describing what speedup he's looking for when it comes to using LLMs to generate new mathematical results.

Tl;dr: He envisions a future of math hinging on two components: first formalization in a language like Lean or similar (Agda, Coq,...) and second an AI component atop of the former. In this case he only has in mind a generative one, namely an LLM (compare this to Lecunn's Dino talk [3]). At min33 he starts conveying the main idea of where he wants to get: what matters is the idea we want to express, not its formalization, except that we need a system or process ensuring the latter actually conveys the former correctly. Even if an LLM system like AlphaProof could generate new mathematical results (papers) that are 99% of the cases fully correct and true, it is not useful for advancing Mathematics because of that probabilistic nature of its correctness. But LLMs could help in the initial step of translating ideas into Lean and then finding a proof, which, by the typed nature of Lean, if it "compiles", it is correct. Therein could lie the real speed up in Mathematical research, where researches would be able to spend more time in shaping their ideas, instead of having to dedicate quite a significant effort in proofreading their own derivations.

I wonder if one could not draw a direct analogy between the role of that formalization step in Lean and the universal underlying representation, in AI parlance, embedding step. In the following text I will be using interchangeably embedding, representation, description and up to some extend formalization too.

Here, formalization is meant as a graded concept: the same math result expressed in Lean would be 100% formalized, while in Natural language in would be anywhere between 0% and say 1% -the point being it's upper bounded.

In this sense is that I would say describing some objects as vectors using an array of integers is in itself a formalization level, at the same time as an embedding. But you could be describing your vectors in a synthetic way, as geometrical objects -the arrows. Or you might be able to dispense altogether of the idea of vectors and describe your problem in a purely diagrammatic way. Mind you, all these descriptions belong to a well formalized area of Mathematics (Linear Algebra, Synthetic Geometry or say Clifford Algebra or Geometric Algebra, and (many) diagrammatic descriptions can be put within Category Theory.

Each description/representation of your world will have different computational advantages. Hence, using different formal math tools (say from different math fields) could make a huge improvement in the results or usefulness of any given ML/AI tool. Maybe these AI tools have each different representations (descriptions, embeddings, formalizations) where their use is optimal.

So, while say the area of Reinforced Learning (RL) is currently stuck, that could change if we find a better description suitable for it. The same would apply in a practical problem you may be stuck trying to use some ML/DL tool to solve it, but changing your world description may lead you efficiently to its solution.

The latter is in fact a "universal" trick not only in Math or Physics: change the way you look at things and they could go from being a roadblock to boosting you forward.

Anyway, it's highly possible that all this is but an incoherent rambling under the influence of a little bug I'm currently evicting.

Note: In Kontorovich video, image and sound are out of sync by far; however, sound and subtitles are in sync.

[1] min33: https://youtu.be/xIG1RI44Nog?t=2002[2] full: https://www.youtube.com/watch?v=xIG1RI44Nog

[3] Lecunn's talk on Dino3: https://www.youtube.com/watch?v=yUmDRxV0krg

PS: Talking about different descriptions, Youtube just fed to me this other recent video on "Towards a Geometric Theory of Deap Learning" by Govind Menon. Maybe worth having a closer look another time.

PS2: The Internet gods pointed me to another video that would perfectly match as a second part of Kontorovich's one: "Where is mathematics going (September 24, 2025)" by Kevin Buzzard, sponsored by the Simons foundation.

Monday, October 6, 2025

On "The Bitter Lesson" by Richard Sutton

Just found an interview with Richard Sutton referencing his "Bitter Lesson" post from 2019. See link at the bottom.

He claims to be making two points. I think his first one is a useful view for considering what AI topic might be more fruitful in the long run. Here long run is meant as on a personal level, not on a long run of Science. Read further along.

The successful approaches would be based fundamentally only on "search" and "learning". These are what often are critized as "simple brute-force approaches".

This approach has been the one leading to the current wide-spread adoption and surprising effectiveness of RL -exemplified by solutions like AlphaGo, AlphaZero-, DeepLearning and LLMs.

And this success should be contrasted to the comparatively failed attempts at implementing techniques bearing what our own understanding of how we solve the same tasks or how we learn is.

In summary, trying to come up with techniques based on our previous world models fared way worse than techniques based on simple, general brute-force methods.

As he mentions in that interview, these claims are meant but as a reflection of what has or not worked out in the past 70 or so years. He makes no claim this will be the case in the future.

His second claim consists of his very last paragraph. However, I'm not certain if his point here isn't just a rephrasing of the previous one.

Neither am I convinced he is consistent: One might argue that what he thinks of "search & learning " are but idealizations, i.e., models of methods created by our minds. In particular, I think that what he considers as "learning" is simply RL, reinforced learning.

But this method is nothing but human knowledge that we would be putting into the AI, which is the approach he seems to be arguing against in this second point.

His only argument around this seems to be that he considers these two processes as simple enough for serving as guiding principles. They do fit with the view of his first point though.

References

* https://en.wikipedia.org/wiki/Richard_S._Sutton

* http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Tuesday, August 26, 2025

The "homotopic"/Geometric Nature of Mathematical Induction

Mathematical Induction for the Natural Numbers ${\mathbb N}$ can be expressed as the following Inference rule within Martin-Löf's Dependent Type Theory:

$\Gamma\,,\,x:{\mathbb N}\,\vdash\,P(x)\;Type$

$\Gamma \,\vdash\,p_0\,:\,P(0_{\mathbb N})$

$\Gamma \,\vdash\,p_s\,:\,\Pi_{(n:{\mathbb N})}\,P(n)\,\to\,P(succ_{\mathbb N}(n))$

$-----------------\;{\mathbb N} ind$

$\Gamma \,\vdash\, ind_{\mathbb N}(p_0\,p_s) \,:\, \Pi_{(n:{\mathbb N})}\,P(n)$

with the computation rules -where we implicitly assume the same premises:

$ind_{\mathbb N}(p_0\,,\,p_s\,,\,0_{\mathbb N})\,\doteq\,p_0\,:\,P(0_{\mathbb N})$

$ind_{\mathbb N}(p_0\,,\,p_s\,,\, succ(n) ) \,\doteq\, p_s(n\, ,\, ind_{\mathbb N}(p_0\,,\,p_s\,,\, n )) \,:\, P(succ(n))$

Together this can be expressed graphically as

Thus, induction provides a way to build a function for free on an Inductive Type: If we have a way to hop from one fiber to "its neighboring" fiber, then, given a point at a starting fiber, we can built a section in the total space of that Type.

That way of "hopping" between fibers is nothing else than the inductive step!

First analogy, a curve can be defined by a differential equation and an initial condition. This equation can be given by a vector field that expresses the tangent to the curve at each point.

Also, in Differential Geometry we have the Covariant derivative, which expresses how we can move a tensor between infinitesimally close points in order to compare it meaningfully at those two points. Imposing the "Covariant change" to be zero along a given curve gives us a particular way to move a tensor between infinitesimally close points such that it stays parallel. This is the parallel transport of a tensor.

We can do the following: parallel transport a vector field along the very same curve it defines. This curve is nothing other than a geodesic.

I'm definitely tempted to draw here an analogy between these Differential Geometry problems and the Induction rule, I'm just not sure what the best analogy might be. As induction gives as a curve, is Induction akin to building a geodesic through a point?

More general an analogy is the relation between induction an Sheaf theory. Paraphrasing an LLM, "both guarantee existence of global objects from local rules, though the technical frameworks differ: induction is sequential and algebraic; sheaf theory is topological and categorical".

Analogies aside, Dependent Type Theory through the lens of Homotopy Type Theory is very rich in intuitive pictures. I find this super cool!

Thursday, August 21, 2025

The Curry-Howard Correspondence

Logic, from the point of view of Dependent Type Theory (DTT), hinges on the Curry-Howard interpretation/correspondence:

Any proposition/predicate is a Dependent Type (DT)

Take the following statement about the Natural numbers $\mathbb N$: $d\,\vert\,n\;\equiv$ "d divides n"

As we know from classical Number Theory, this is equivalent to saying (classical logic) $\exists\, k\,\in\,{\mathbb N}\;\vert\quad d\cdot k\,=\,n$.

Hence, the correspondence is

$d\, \vert\, n\quad\leftrightarrow\quad\Sigma_{(k:\mathbb N)}d\cdot k\;=\;n$

On the LHS we have classical Logic/Number Theory; on the RHS we have Dependent Type Theory. Thus, in DTT the existential quantifier gets translated into a dependent sum.

But what is that equality on the rhs? Ans.: $IdentPath_{\mathbb N}(d\cdot k\,,\,n)$, the Path type defined on the Natural numbers identifying both values.

Thus, any value $p$ of that sum type, $p\,:\,\Sigma_{k:\mathbb N}d\cdot k = n$, is a point $\left(k\, ,IdentPath_{\mathbb N}(d\cdot k\, , n)\right)$ that provides a proof of the divisibility of $n$ by $d$.

Let's see how we would prove a statement like "any natural numbers $x$ is divisible by 1". The universal quantifier from classical logic involved here gets translated into the Product type:

$\Pi_{x:\mathbb N }\,1\vert x\quad\equiv\quad\Pi_{x:\mathbb N}\Sigma_{k:\mathbb N}\,1\cdot k = x$

In oder to prove this we need to find/construct a "witness" or element $p\,:\,\Pi_{x:\mathbb N}1\vert x$. Incidentally, this is called a section on the space $\Sigma_{x:\mathbb N}1\vert x$.

Decomposing $p$ by applying it to a number $x$ we see that it must be $p(x):\Sigma_{k:\mathbb N}1\cdot k = x$. Again, this is of the form $(k, q(x))$, where $q(x):\,1\cdot k\,=\,x$.

Here comes where would use the fact that in a DTT formulation of the Natural Numbers we would previously have constructed a path like $r(x)\,:\,1\cdot x\,=\,x$, which corresponds to the rule of left multiplication by the neutral element of the product in the Natural numbers.

Hence, we have the element $p(x) = \left(x\, ,r(x)\right)$ which is of the desired type $\Sigma_{k:\mathbb N}1\cdot k\,=\,x$. As $x$ is an arbitrary number, $p$ is the sought for function $p\,:\,\Pi_{x:\mathbb N}\,1\vert x$.

We have thus proved in Dependent Type Theory that 1 divides any natural number.

I thought these two examples could show the gist of how to think in Dependent Type Theory about logical problems and proofs.

Andrej Bauer's online course slides are great, but one of the books he recommends for further reading I'm starting to like quite a lot: "Introduction to Homotopy Type Theory" by Egbert Rijke, which is available for free at https://arxiv.org/pdf/2212.11082. These examples are from there.

Sunday, July 27, 2025

"Holography" in Euclidean Geometry

Let's make one last summary of all below.

Since I learned on my own to address problems in Euclidean geometry from a purely synthetic approach, I've been fascinated by it. If only I had learned this on my teenage years...

Yet this time the passion for a visual approach seems to have mislead me down a windy rabbit hole just come out at what seems trivial had I chosen a more analytical approach of counting degrees of freedom.

We need 6 numbers to specify the coordinates of three points in the plane. If it's a right triangle, only five coordinates are independent.

But we don't have to stick to cartesian coordinates.

If we fix one point on the origin, we need four number for the two remaining points plus two more for defining the translation that sat this triangle on the origin.

We can repeat that thought process and further constrain a second point to be on the vertical axis. We can bring it there by a rotation, after using a translation to fix the first point. This leaves then three coordinates for one point "and a half", plus a number for specifying the rotation we had to apply, and two more numbers for the translation. For a right triangle, the third point is constrained as well, in this case, to be on the horizontal axis. Again it amounts to 6 and 5 dof, respectively.

If we fix the length of the "vertical" side to 1, we reduce the previous count of coordinates by one, which we use to specify a dilation. Fir right triangles then we have one horizontal coordinate, one scaling factor, one rotation, and the two numbers for a translation.

And this is nothing more what all the rambling below amounts to.

So, yeah, we have reduced a right triangle to a single point on a line; the whole 2d figure "emerges" from that single point. But the number of degrees of freedom have not changed, we simply hid them in the form of transformations.

I think one take home lesson is to always explore a problem from multiple perspectives, e.g, synthetic & analytic , very soon at the start. Some paths offer much less resistance for making solid steps forward.

On the foolishness of NL programming

The title of this post is but that of an article by the late E.W. Dijkstra that is still fully valid in these days of frantic work on LLMs:

The virtue of formal texts is that their manipulations, in order to be legitimate, need to satisfy only a few simple rules; they are, when you come to think of it, an amazingly effective tool for ruling out all sorts of nonsense that, when we use our native tongues, are almost impossible to avoid.

Instead of regarding the obligation to use formal symbols as a burden, we should regard the convenience of using them as a privilege: thanks to them, school children can learn to do what in earlier days only genius could achieve. When all is said and told, the "naturalness" with which we use our native tongues boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.

You can access it here.

Sunday, April 6, 2025

A Scoop On The Current Research Frontiers

BREAKTHROUGH PRICES 2025

GLP-1 Diabetes and Obesity Discovery | Multiple Sclerosis Causes and Treatments | DNA Editing

Exploration of Nature at Shortest Distances

Proof of Geometric Langlands Conjecture

Special Prize Awarded to Giant of Theoretical Physics

Breakthrough Prize in Life Sciences Awarded to Daniel J. Drucker, Joel Habener, Jens Juul Holst, Lotte Bjerre Knudsen and Svetlana Mojsov; Alberto Ascherio and Stephen L. Hauser; and David R. Liu

Breakthrough Prize in Fundamental Physics Awarded to More than 13,000 Researchers from ATLAS, CMS, ALICE and LHCb Experiments at CERN

Breakthrough Prize in Mathematics Awarded to Dennis Gaitsgory

Special Breakthrough Prize in Fundamental Physics Awarded to Gerardus 't Hooft

Six New Horizons Prizes Awarded for Early-Career Achievements in Physics and Mathematics

Three Maryam Mirzakhani New Frontiers Prizes Awarded to Women Mathematicians for Early-Career Work

Te official announcement has descriptions of each contribution: https://breakthroughprize.org/News/91