If you see this, something is wrong
To get acquainted with the document, the best thing to do is to select the "Collapse all sections" item from the "View" menu. This will leave visible only the titles of the top-level sections.
Clicking on a section title toggles the visibility of the section content. If you have collapsed all of the sections, this will let you discover the document progressively, from the top-level sections to the lower-level ones.
Generally speaking, anything that is blue is clickable.
Clicking on a reference link (like an equation number, for instance) will display the reference as close as possible, without breaking the layout. Clicking on the displayed content or on the reference link hides the content. This is recursive: if the content includes a reference, clicking on it will have the same effect. These "links" are not necessarily numbers, as it is possible in LaTeX2Web to use full text for a reference.
Clicking on a bibliographical reference (i.e., a number within brackets) will display the reference.
Speech bubbles indicate a footnote. Click on the bubble to reveal the footnote (there is no page in a web document, so footnotes are placed inside the text flow). Acronyms work the same way as footnotes, except that you have the acronym instead of the speech bubble.
By default, discussions are open in a document. Click on the discussion button below to reveal the discussion thread. However, you must be registered to participate in the discussion.
If a thread has been initialized, you can reply to it. Any modification to any comment, or a reply to it, in the discussion is signified by email to the owner of the document and to the author of the comment.
First published on Sunday, Mar 9, 2025 and last modified on Wednesday, Apr 9, 2025 by François Chaplais.
University of Warsaw, Department of Mathematics, (MIMUW), Ulica Banacha 2, 02-097 Warsaw
University of Warsaw, Department of Mathematics, (MIMUW), Ulica Banacha 2, 02-097 Warsaw
University of Warsaw, Department of Mathematics, (MIMUW), Ulica Banacha 2, 02-097 Warsaw
.
This research is part of the project No. 2022/47/P/ST1/01177 cofunded by the National Science Centre and the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 945339 For the purpose of Open Access, the author has applied a CC-BY public copyright licence to any Author Accepted Manuscript (AAM) version arising from this submission. The first author extends heartfelt gratitude to the organizers of the Wisła Baltic Workshop 2023 and Maria Ulan for the invitation to lecture on information geometry, an opportunity that ultimately led to the creation of this book for students, researchers, and all those curious about this fascinating field. The first author is also grateful to Maxim Kontsevich for asking stimulating questions on the topic, that inspired deeper exploration and helped illuminate aspects of the subject that had remained unclear. This textbook is, in part, an effort to clarify these questions and provide a structured introduction to the topic. The first author thanks Katarzyna Pietruska-Pałuba for a great motivational discussion, which led to the creation of the final shape of this manuscript. We thank Janusz Grabowski for his interest in the subject. Warm thanks go to Matilde Marcolli also. Finally, we would like to express all our gratitude to Yuri I. Manin, whose untimely death has saddened us all.
Welcome to this Introductory Textbook on Geometry and Information Geometry! Whether you are a student, researcher, or simply someone curious about this fascinating field, this book is designed to provide you with a clear and accessible entry point into the world of geometry and its modern extension into information geometry.
Geometry, in its many forms, has been a cornerstone of mathematics for centuries. With the advent of information theory and machine learning, new geometric structures have emerged that help us better understand complex probabilistic models, optimization, and data analysis. Information geometry is one such powerful framework, blending classical differential geometry with modern applications in probability and statistics.
This book is structured to guide you from the fundamentals of topology and differentiable manifolds to the more advanced concepts of probability geometry and Frobenius manifolds. The progression is designed to be intuitive, allowing you to build your knowledge step by step.
Throughout the chapters, you will find exercises to test your understanding, as well as solutions to some of the more challenging problems to aid your learning. To complement the material, you are encouraged to follow along with the accompanying YouTube video (see the QR code below), which aligns with this textbook and served as an inspiration for its creation.
.
We hope this book provides you with both insight and enjoyment as you embark on your journey into geometry and information geometry.
This introductory text arises from a lecture given in Göteborg, Sweden, given by the first author and is intended for undergraduate students, as well as for any mathematically inclined reader wishing to explore a synthesis of ideas connecting geometry and statistics. At its core, this work seeks to illustrate the profound and yet natural interplay between differential geometry, probability theory, and the rich algebraic structures encoded in (pre-)Frobenius manifolds.
The exposition is structured into three principal parts. The first part provides a concise introduction to differential topology and geometry, emphasizing the role of smooth manifolds, connections, and curvature in the formulation of geometric structures. The second part is devoted to probability, measures, and statistics, where the notion of a probability space is refined into a geometric object, thus paving the way for a deeper mathematical understanding of statistical models. Finally, in the third part, we introduce (pre-)Frobenius manifolds, revealing their surprising connection to exponential families of probability distributions and, discuss more broadly, their role in the geometry of information. At the end of those three parts the reader will find stimulating exercises.
By bringing together these seemingly distant disciplines, we aim to highlight the natural emergence of geometric structures in statistical theory. This work does not seek to be exhaustive but rather to provide the reader with a pathway into a domain of mathematics that is still in its formative stages, where many fundamental questions remain open. The text is accessible without requiring advanced prerequisites and should serve as an invitation to further exploration.
In what follows, we shall be concerned with a surprising fusion of ideas, namely, that of differential geometry and probability theory (or, more generally, statistics). At first glance, these two domains might appear unrelated, yet their synthesis gives rise to a highly rich and intricate mathematical structure, leading to the emergence of a relatively young field, known as the geometry of information. The relevance of this framework extends beyond pure mathematics, with applications in artificial intelligence (such as in language models like ChatGPT) and machine learning. Given the accelerating pace of developments in these areas, it is imperative to pursue the deeper mathematical underpinnings of this theory.
One of the great advantages of geometry is that it offers an intuitive and often visual means of understanding complex situations. This feature allows us to circumvent difficulties that might otherwise seem insurmountable. More precisely, at the heart of the geometry of information lies a fundamental object: the manifold of probability distributions, where the probability measures belong to a specific class of distributions.
To make matters precise, recall that a probability space consists of a triple \( (\Omega, \mathcal{F}, P)\) , where \( \Omega\) is the sample space (the set of possible outcomes), \( \mathcal{F}\) is a \( \sigma\) -algebra of measurable subsets of \( \Omega\) , and \( P\) is a probability measure assigning to each event in \( \mathcal{F}\) a real number in the interval \( [0,1]\) . This object \( (\Omega, \mathcal{F}, P)\) serves as a rigorous mathematical model for phenomena arising in nature. The challenge we face is to endow this structure with additional geometric data, thereby giving rise to a natural geometric space. However, this task is far from trivial.
A first step is to recognize that the set \( \mathcal{F}\) carries a particular mathematical structure: that of a \( \sigma\) -algebra. The significance of \( \sigma\) -algebras lies in their closure properties: if \( (\Omega, \mathcal{F})\) is a measurable space, then \( \mathcal{F}\) is closed under countable unions, countable intersections, and complements. However, when seeking to introduce a geometric framework, it is often more convenient to work directly with probability distributions rather than the space \( (\Omega, \mathcal{F})\) itself.
Probability distributions come in a vast array of different families. To clarify our discussion, let us recall the three principal classes of probability distributions:
For our purposes, we restrict attention to the second class, focusing in particular on absolutely continuous distributions supported on intervals of length \( 2\pi\) , namely, wrapped exponential distributions.
A remarkable fact, which has emerged through careful analysis, is that probability distributions of this type possess a hidden geometric structure. The apparatus of differential geometry provides the natural language for uncovering and describing this structure. Objects such as connections, parallel transport, curvature, and flatness serve as fundamental tools in the construction of a geometric framework for probability distributions. This allows for the realization of a novel class of geometric spaces: the manifolds of probability distributions.
More concretely, consider a measurable space \( (\Omega, \mathcal{F})\) , where \( \mathcal{F}\) is a \( \sigma\) -algebra on \( \Omega\) . If we consider a family of parametrized probability distributions on \( (\Omega, \mathcal{F})\) , then the space of such distributions naturally inherits the structure of a manifold.
The study of these manifolds, particularly in the case of exponential families, leads to unexpected connections with Topological Field Theory (TFT). The relation between exponential families and TFT is not accidental; rather, it is deeply encoded in the mathematical structures underlying both domains. Indeed, Topological Field Theory is intimately linked to the celebrated Witten–Dijkgraaf–Verlinde– Verlinde (WDVV) equations, which in turn play a central role in the theory of Frobenius manifolds, developed and studied by Dubrovin, Manin, Kontsevich, and others.
A fundamental aspect of our approach, which renders manifest the deep relation between the manifold of probability distributions of exponential type and the (pre-)Frobenius manifold structures, lies in the delicate matter of choosing an appropriate system of coordinates. Indeed, the very possibility of perceiving this connection in an explicit and natural manner is contingent upon the identification of a privileged class of coordinates—ones that reflect, in their very definition, the intrinsic geometry of the underlying structures. The art, then, is not merely to introduce coordinates, but to do so in a manner that unveils (rather than obscures) the hidden algebraic and differential properties inherent in the space.
Thus, in certain cases, we are able to establish a deep and precise connection between the WDVV equations and the geometry of information. This, in turn, suggests that the study of probability distributions, when viewed through a geometric lens, is far richer than initially expected and might hold profound implications for both mathematical physics and information theory.
This introductory textbook is structured in three interrelated parts, each designed to build a solid foundation in modern mathematics and lead the reader toward the emerging field of information geometry.
The book begins with a modern treatment of general topology, reinterpreting Kuratowski’s early ideas [1] in a contemporary framework. Readers are introduced to essential topological concepts—such as open sets, continuity, convergence, and compactness—establishing the language and tools necessary for later discussions.
Building on the groundwork of topology, the text moves into the realm of manifolds. Chapter 2 explains topological manifolds—spaces that locally resemble Euclidean space—while also presenting modelled manifolds based on S. Lang’s influential works [2, 3]. For additional reference, the text suggests consulting [4] to cover aspects not fully developed in this book. This chapter serves as a bridge, transitioning from abstract topological spaces to concrete geometric structures.
Differentiability is explored next with a focus on Gateaux derivatives. This notion is particularly useful in infinite-dimensional settings, especially in probability theory where a norm may not be available. The text carefully contrasts Gateaux differentiability with the stronger concept of Fréchet differentiability, ensuring students understand the subtleties and applications of both approaches.
Fiber bundles are introduced as indispensable tools in differential geometry. This chapter outlines the local-to-global perspective that fiber bundles offer, demonstrating how complex geometric structures can be understood by piecing together simpler, locally trivial components.
Delving deeper into the geometric framework, Chapter 5 covers connections, parallel transport, and covariant derivatives—key concepts for understanding how geometric data evolves along a manifold. Classical references such as [5] and [6] provide further reading on these fundamental topics. Additionally, an introduction to sheafs is provided, offering a gentle entry point into their role in modern geometry, with [7] recommended for a complete reference. This material lays the groundwork for applying geometric techniques within information geometry later in the book.
The second part shifts focus to probability theory and statistics. It introduces standard definitions, theorems, and methodologies, employing familiar examples—such as coin flipping—to illustrate key concepts. The chapter also discusses the Radon–Nikodym derivative, linking measure theory with probabilistic reasoning. References such as [8, 9] for measure theory and [10] for statistics are suggested for further reading.
These chapters are particularly innovative, blending philosophy with mathematics. Beginning with a discussion inspired by Klein’s geometry and Plato’s ideas, the text presents a novel perspective on how categorical structures naturally arise in the study of probability distributions. By considering manifolds of probability measures and the role of Markov kernels, the reader is guided through a modern generalization of classical geometry into a probabilistic and categorical setting. Key references for this section include [11, 12, 13, 14, 15].
The final part of the book centers on Frobenius manifolds—geometric structures that elegantly encapsulate the interplay between algebra and geometry, coming from 2D Topological Field Theory. Here, the text explains how Frobenius structures emerge naturally within the framework of information geometry. For foundational references on Frobenius manifolds and related topics, the book cites [16, 17].
Chapter 11 also integrates cutting-edge research from 2020 onward [18, 19], demonstrating the latest developments in the field. A notable highlight is the discussion of learning methods pioneered by researchers such as D. Ackley, G. Hinton, and T. Sejnowski [20], showing how deep learning techniques relate to the broader mathematical framework of information geometry.
Conclusion This textbook provides a concise, accessible, and modern overview of several core areas of mathematics—topology, geometry, probability, and category theory—culminating in the study of information geometry. Through careful exposition and a judicious selection of topics, it equips students with the necessary tools to explore this young and rapidly evolving branch of mathematics, bridging classical theory with modern research and applications.
General topology, also known as point-set topology, provides the foundational language for modern mathematics by studying the most fundamental properties of sets and their structures. Topology focuses on the intrinsic properties of spaces that remain unchanged under continuous transformations.
A topological space is a set \( X\) equipped with a topology—a collection of subsets called open sets that satisfy specific axioms and ensuring consistency with notions of continuity and convergence. This framework generalizes many familiar mathematical spaces, including metric spaces and Euclidean spaces.
Let us mention some key concepts in general topology:
These fundamental concepts serve as the basis for more advanced areas of mathematics, including analysis, geometry, algebraic topology, and functional analysis.
For readers already familiar with general topology, this section can be skipped. However, for those who need a refresher, the following discussion will provide a structured and intuitive approach to these essential topics before moving on to more advanced material.
Definition 1 ( Topological space)
Let \( X\) be a set and denote by \( \mathcal{P}(X)\) the power set, that is, the set of all its subsets. A topology on \( X\) is a distinguished collection of subsets, \( \mathcal{T} \subset \mathcal{P}(X)\) , which we regard as specifying the admissible open sets. This collection satisfies the following axioms:
The set itself and the empty set belong to the topology:
Any arbitrary union of open sets remains open:
Any finite intersection of open sets remains open:
Thus, a topology provides the language of continuity: it defines what it means for a function to be continuous without reference to distances, relying only on the structure of open sets.
The elements of \( \mathcal{T}\) are called the open of the topology; the conditions (1), (2) and (3) form the axioms of a topology.
To summarize, we can therefore also state this definition by saying that a topology is a collection of subsets of \( X\) , called open, which must verify that
We will sometimes write \( (X,\mathcal{T})\) to specify that we are considering a set \( X\) equipped with its topology \( \mathcal{T}\) .
Definition 2 ( Basis)
A basis \( \mathcal{B}\) for a topology \( \mathcal{T}\) is a family of elements of \( \mathcal{T}\) such that every \( \mathcal{U}\in \mathcal{T}\) is the union of elements of \( \mathcal{B}\) .
We then refer to this as a topology generated by \( \mathfrak{B}\) .
An equivalent definition can be given. A basis \( \mathfrak{B}\) for a topology \( \mathcal{T}\) is a family of elements of \( \mathcal{T}\) such that for each \( x\in X\) and \( \mathcal{U}\in \mathcal{T}\) , with \( x\in \mathcal{U}\) , there exists \( \mathcal{B}\in \mathfrak{B}\) such that \( x\in \mathcal{B}\) and \( \mathcal{B}\subset \mathcal{U}\) .
Example 1
A basis for the usual topology of \( \mathbb{R}\) is provided by the set of open intervals \( \{]a,b[ \mid a<b, a,b\in \mathcal{R}\}\) .
An example of a topology widely used in practice is the metric topology.
Definition 3 ( Metric space)
A metric space is a set \( M\) endowed with a notion of distance, that is, a function \( d : M\times M \to \mathbb{R}^{+}\) that satisfies:
The topology of \( M\) is generated by the open balls \( B_{r}(a)=\{x\in M \mid d(a,x)<r\}\) .
Exercise 1
The Cantor ternary set \( \mathcal{C}\) is created by iteratively deleting the open middle third from a set of line segments. One starts by deleting the open middle third \( (\frac{1}{3},\frac {2}{3})\) from the interval \( [0,1]\) , leaving two line segments: \( [0,\frac {1}{3}]\cup [\frac {2}{3},1]\) . One continues iterating the a similar procedure for the remaining line segments. The Cantor set is constituted from all points in the interval \( [0,1]\) that are not deleted at any step in this infinite process. Is the Cantor set a metric space?
Definition 4 ( Neighborhood)
Definition 5 ( Trivial and discrete topology)
Definition 6 ( Coarser and finer topology)
Coarser topology : if \( \mathcal{T}_{1}\) and \( \mathcal{T}_{2}\) are two topologies on \( X\) such that
\( \mathcal{T}_{1}\) is said to be coarser than \( \mathcal{T}_{2},\)
Fine topology : if \( \mathcal{T}_{1}\) and \( \mathcal{T}_{2}\) are two topologies on \( X\) Such that
then \( \mathcal{T}_{2}\) is said to be finer than \( \mathcal{T}_{1}\) .
\( \bullet\) The coarsest topology is the trivial topology.
\( \bullet\) The finest topology is the discrete topology.
Definition 7 ( Induced topology on a subset)
Let \( (X, \mathcal{T}_{X })\) be a topological space and \( A \subset X\) be a subset. We define a topology \( \mathcal{T}_{A}\) on \( A\) by setting:
In other words, we take as open sets of \( A\) the intersections of open sets of \( X\) with \( A\) .
Definition 8 ( Quotient topology)
Let \( (X,\mathcal{T})\) be a topological space and \( \mathcal{R}\) be an equivalence relation on \( X\) . Let the map
associate an element \( x\in X\) with an equivalence class of \( X\) . The open sets of the quotient topology on \( X/\mathcal{R}\) are the subsets \( \mathcal{V}\subset X/\mathcal{R}\) such that \( \mathcal{V}=p^{-1}(\mathcal{U})\) , where \( \mathcal{U}\in \mathcal{T}\) .
In the study of topological spaces, separation properties play a crucial role in understanding the structure of a space and the behavior of functions defined on it.
Definition 9 ( Hausdorff space - \( \mathbf{T_2}\) )
A topological space \( (X,\mathcal{T})\) is called Hausdorff (separated or \( \mathbf{T_2}\) ) if for any pair of distinct points \( {\scriptstyle M}\) and \( {\scriptstyle N}\) , we can find two open sets \( \mathcal{U}_{M}, \mathcal{U}_{N}\) with \( {\scriptstyle M}\in\mathcal{U}_{M} , {\scriptstyle N}\in\mathcal{U}_{N}\) and \( \mathcal{U}_{M}\cap \mathcal{U}_{N}=\emptyset\) .
Definition 10 ( Normal space)
A topological space \( (X,\mathcal{T})\) is normal if it is Hausdorff and if for any pair of disjoint closed sets \( F_{1}\) and \( F_{2}\) , there exist two disjoint open sets \( \mathcal{U}_{1}\) and \( \mathcal{U}_{2}\) such that \( F_{1}\) is included in \( \mathcal{U}_{1}\) and \( F_{2}\) in \( \mathcal{U}_{2}\) .
Theorem 1
A Hausdorff space \( X\) is normal if and only if it satisfies the following condition:
For every closed subset \( F\subset X\) and every open subset \( \mathcal{U}\) containing \( F\) , there exists an intermediate open subset \( \mathcal{V}\) containing \( F\) satisfying
Notice that here \( \overline{\mathcal{V}} \) is properly defined in Sec. 2.1.4 under the terminology of closure of a set: \( \overline{\{p\}}=\{p\}\) .
Example 2
An open set \( \mathcal{U}\subset \mathbb{R}^{n}\) is a Hausdorff space and moreover a normal space.
Definition 11
A topological space is \( \bf{ T_1}\) if for any two distinct points \( x\) and \( y\) , there exist open sets \( U\) and \( V\) such that:
In other words, each point is closed (its complement is open).
Exercise 2
Prove or disprove that any Hausdorff space is a \( \mathbf{T_1}\) space. A \( \mathbf{T_1}\) space is a space in which any set consisting of one point is closed.
\( \star\) Warning! There exist \( \bf{ T_1}\) space that are not \( \bf{ T_2 }\)
Exercise 3
Give an example of a \( \bf{ T_1}\) space that is not \( \bf{ T_2 }\) .
Definition 12 ( Continuity)
Consider two topological spaces \( (X,\mathcal{T}),(X',\mathcal{T}')\) . A map \( f:(X,\mathcal{T})\to,(X',\mathcal{T}')\) is continuous if for any open set \( \mathcal{U}'\in \mathcal{T}'\) its inverse image \( g^{-1}(\mathcal{U}')\) is an open set of the topology \( \mathcal{T}\) .
Theorem 2
A map \( f:(X,\mathcal{T})\to(X',\mathcal{T}')\) is continuous at the point \( x_0\in X\) if for every neighborhood \( \mathcal{N}(f(x_0)) \subset X'\) there exists a neighborhood \( \mathcal{N}(x_0 )\subset X)\) of \( x\in X\) such that \( f(x) \in \mathcal{N}(f(x_0))\) , whenever \( x \in\mathcal{N}(x_0)\) .
The map \( f\) is continuous on \( X\) if it is continuous at all \( x\in X\) .
Exercise 4
Show that for any pair of continuous functions \( f,g:X\to Y\) , where \( X\) is a topological space \( X\) and \( Y\) is a Hausdorff space, the set
is closed in \( X\) .
Definition 13 ( Homeomorphism)
A homeomorphism between two topological spaces \( (X,\mathcal{T}_{X}), (Y,\mathcal{T}_{Y})\) is a continuous bijective map \( h :(X,\mathcal{T}_{X})\to (Y,\mathcal{T}_{Y})\) whose inverse map is continuous.
Definition 14 ( Open map)
A continuous map \( f: X\to Y\) is called open if the image of any open set of \( X\) is an open set of \( Y\) .
Definition 15 ( Closed map)
A map \( f:X\to Y\) between topological spaces is said to be closed if the subset
The set \( \Gamma_f\) is closed in \( X\times Y\) . The set \( \Gamma_f\) is called the graph of \( f\) .
We will recall some definitions and theorems related to various properties of subsets of a topological space \( X\) .
Definition 16
Nowhere dense set :
The set \( A \subset X\) is nowhere dense in \( X\) if its adhesion \( \bar A\) has an empty interior.
Definition 17 ( Baire space)
A topological space is called a Baire space if the intersection of any countable family of dense open sets remains dense.
Equivalently, a topological space is Baire if the union of any countable collection of closed sets with empty interior also has an empty interior.
Complete metric spaces, as well as locally compact Hausdorff spaces, provide natural examples of Baire space.
Example 3
Exercise 5
Is the set of rational numbers equipped with the subspace topology a Baire space? Write a proof.
Exercise 6
Is the set of real numbers equipped with standard topology a Baire space? Write a proof.
Theorem 3 ( Baire’s Category Theorem)
Theorem 4
Definition 18
Compact : A subset \( A \subset X\) is compact if it is Hausdorff and each covering has a finite subcovering.
The condition of being Hausdorff can be dropped in a large number of situations.
Locally compact: A subset is locally compact if every point has a compact neighborhood.
Notice that, Euclidean spaces are locally compact but not compact.
Theorem 5
The Heine-Borel theorem concerns finite-dimensional spaces, it is not necessarily true for an arbitrary topological space. In particular, a closed bounded set with a non-empty interior of an infinite-dimensional normed vector space is never compact, for the norm topology.
Theorem 6
Let \( K\) be a compact space. The image of \( K\) under a continuous map \( f\) is also compact.
Corollary 1
Given a compact space \( K\) , any continuous function on \( K\) attains on \( K\) a minimum and a maximum value.
Exercise 7
Prove that the Cantor set is compact.
Exercise 8
Consider the set \( K\) of all functions \( f: [0, 1] \to [0, 1]\) equipped with the Lipschitz condition i.e. \( |f(x) - f(y)| \leq |x - y|\) for all \( x, y \in [0,1]\) . Consider on \( K\) the metric induced by the uniform distance
Prove that the space \( K\) is compact.
Definition 19 ( Connectedness)
A topological space \( X\) is said to be connected if it cannot be described as the union of two disjoint (non-empty) open sets. Otherwise \( X\) is said to be non-connected.
A subspace of a topological space is said to be connected if it is connected for the induced topology.
Exercise 9
Is the General Linear group \( GL_n(\mathbb{R})\) (i.e. the space of square matrices of size \( n\times n\) with non-null determinant) a connected space? Provide a proof.
Theorem 7
For a topological space \( X\) the following conditions are equivalent:
Definition 20 ( Locally connected)
A topological space \( X\) is said to be locally connected if every neighborhood of every point \( x \in X\) contains a connected neighborhood.
Examples 1
Let \( \mathbb{R}^{2}\) be equipped with the standard topology and let \( K=\{\frac{1}{n} \mid n\in \mathbb{N}\}\) . We call the comb space the set
considered as a subspace of \( \mathbb{R}^{2}\) equipped with the induced topology. The comb is a connected space that is not locally connected.
\( \star\) Warning! A space can be connected without being locally connected.
Exercise 10
Is the curve \( y=\sin\frac{1}{x}\) on the interval \( (0,1]\) locally connected?
Exercise 11
Compute how many connected components has the real quartic surface:
where \( u=x^2+y^2+z^2,v=x^2y^2+y^2z^2+x^2z^2\) .
Do the same for:
Definition 21 ( Covering space)
The covering of a topological space \( X\) is a pair \( (\tilde X,f)\) , where \( \tilde X\) is a connected and locally connected space and \( f\) is a continuous map from \( \tilde X\) onto \( X\) , such that for every neighborhood of \( x\) , \( x\in \mathcal{N}(x)\) , the restriction of \( f\) to each connected component \( C_{\alpha}\) of \( f^{-1}(\mathcal{N}(x))\) is a homeomorphism from \( C_{\alpha}\) to \( \mathcal{N}(x)\) .
Definition 22 ( Simply connected)
A topological space \( X\) is simply connected if \( X\) is connected and locally connected and any covering \( (\tilde X,f)\) is isomorphic to the trivial covering \( (X,Id)\) where \( Id\) is the identity map.
Example 4
A universal covering space is the "simplest" simply connected space that covers a given space. Rigorously, it is defined as follows.
Definition 23 ( Universal covering)
\( (\tilde X,f)\) is a universal covering of the space \( X\) if it is a covering and if \( \tilde X\) is simply connected.
Example 5
A covering of \( S^{1}\) is given by the pair \( (\mathbb{R},\pi)\) , where \( \pi\) is the canonical projection given by
and \( t\) is a real parameter.
Exercise 12
Does the torus \( S^1\times S^1\) have a universal covering? If yes, give it explicitly.
Demonstrate that the Kähler torus \( T^n=\mathbb{C}^n/\Lambda\) in \( \mathbb{C}^n\) where \( \Lambda\) is a lattice in \( \mathbb{C}^n\) (meaning that it is a discrete subgroup isomorphic to \( \mathbb{Z}^{2n})\) has a universal covering space.
Definition 24 ( Locally simply connected)
A topological space \( X\) is locally simply connected if every point \( x\in X\) has at least one simply connected neighborhood.
Example 6
The sphere \( S^2\) is locally simply connected, since every point on the sphere has a simply connected neighborhood. For every point on \( S^2\) one can choose a small open neighborhood around that point, homeomorphic to an open disk in \( \mathbb{R}^2\) .
Exercise 13
We consider a space which consists of infinitely many circles of decreasing radius, all tangent to a single point. This is called a Hawaiian earring. Prove that this space is not locally simply connected.
Definition 25 ( Isomorphism of covering)
Two coverings \( (\tilde X_{1},f_{1})\) and \( (\tilde X_{2},f_{2})\) are isomorphic if
Definition 26 ( Fundamental group)
Let \( X\) be a space that admits a universal covering \( (\tilde X,f)\) . The group of homeomorphisms \( \varphi\) of \( \tilde X\) onto itself such that \( f\circ \varphi = f\) is called the fundamental group of \( \tilde X\) .
Since two universal coverings are isomorphic, so are the fundamental groups. The corresponding abstract group is called the fundamental group of \( X\) .
Exercise 14
Compute the fundamental groups of the following topological spaces:
Theorem 8
Definition 27 ( Path-connected)
Theorem 9
If a topological space is path-connected (locally path-connected) then it is connected (locally connected).
\( \star\) Warning! The converse statement is false.
Example 7
Consider the topologist’s sine curve, which is constituted from the graph of the function \( y=sin(\frac{1}{x})\) , where \( x\in (0,1]\) and the vertical line segment \( \{0\}\times [-1,1]\) . The space is connected however it is not path-connected. Indeed, there is no continuous path connecting a point on the vertical segment to a point on the sine curve. In particular, any neighborhood around any point on \( \{0\}\times [-1,1]\) contains infinitely many disconnected components of the sine curve.
Exercise 15
Provide an other example where the space is connected but not path-connected.
The notion of homotopy is a purely topological notion which allows to consider classes of topological objects up to some (homotopical) relation.
To give an everyday life example, up to homotopy, a mug is equivalent to a doughnut (because they are both homotopy equivalent to a torus).
Definition 28 ( Homotopic paths)
Two paths \( \gamma_{1}, \gamma_{2}:[0,1]\to X\) are homotopic if there exists a continuous map \( F:[0,1]\times [0,1] \to X\) such that \( F(t,0)=\gamma_{1}(t)\) and \( F(t,1)=\gamma_{2}(t)\) .
Example 8
In the figure below, we draw a pair of homotopic planar curves. These curves are the level curves of the real and imaginary parts of a pair of complex polynomials.

Can you tell the degree of those curves? Give (approximately) the equations of those polynomials.
Definition 29 ( Homotopic map)
Theorem 10
If \( X\) is path-connected and locally path-connected, it is simply connected if every path \( \gamma\) in \( X\) is homotopic to the constant map.
The notion of homotopy between two functions allows to define an equivalence relation between topological spaces.
Definition 30 ( Homotopically equivalent spaces)
Two spaces \( X\) and \( Y\) are said to be homotopically equivalent (or “ of the same homotopy type ”) if there exist two continuous maps \( f:X\to Y\) and \( g:Y\to X\) such that :
Definition 31 ( Contractible space)
A space is called contractible if it is homotopically equivalent to a point.
This definition is equivalent to saying that its identity map is homotopic to a constant map.
Example 9
The space \( \mathbb{R}^{n}\) is contractible.
Theorem 11
Two homeomorphic topological spaces are homotopically equivalent
\( \star\) Warning! The converse is false, as shown by the following examples:
Definition 32 ( Disjoint sum)
The disjoint sum of an indexed family of sets \( \{E_i\}_{i\in I}\) is defined by:
(1)
Note: This definition of the disjoint sum allows to take into account the case where the \( E_{i}\cap E_{j}\ne \emptyset\) . Indeed, the elements of the disjoint sum are ordered pairs \( (x,i)\) . Here \( i\in I\) serves as an auxiliary index that indicates from which \( E_i\) the element \( x\) came from. Each of the \( E_i\) sets are canonically isomorphic to the set \( \tilde E_i=\{(x,i): x\in E_i\}\) . So, if we define \( \varphi_j:E_j \to \coprod_{i\in I}E_i\) by \( \varphi_j(x)=(x,j), \) where \( j\in I\) then \( \varphi_{j}\) is a bijection from \( E_j\) onto \( \tilde E_j=\{(x,i)\mid x\in E_j\}\) and the \( \tilde E_j\) are disjoint, even if the \( E_i\) are not.
Usually to state that the union is a disjoint union we replace \( \bigcup\) by \( \bigsqcup\) .
Definition 33 ( Disjoint sum topology)
The disjoint sum topology on \( \coprod_{i\in I}E_{i}\) is defined by
Proposition 1 (Fundamental properties of the disjoint sum topology)
The maps \( \varphi_{j}: X_{j}\to \coprod_{i\in I}E_{i}\) are continuous, open and closed; they are in fact homeomorphisms on
Let \( Y\) be a topological space and \( f: \coprod_{i\in I}E_{i}\to Y\) be a map. Then
Proof
It is clear that \( \emptyset\) and \( \coprod_{i\in I}E_{i}\) belong to \( \mathcal{T}\) .
Axioms (2) and (3) are a consequence of the fact that the \( \mathcal{T}_{i}\) are topologies and of the following two set-theoretic properties of the operation \( \varphi_{j}^{-1}\) :
if we have a family \( \{\mathcal{U}_{\lambda}\}_{\lambda\in\Lambda}\) of subsets of \( \coprod_{i\in I}E_{i} \) , then:
if \( \mathcal{U}_{1},\dots,\mathcal{U}_{N} \subset \coprod_{i\in I}E_{i}\) ,
By the above statement, the \( \varphi_{i},\) where \( i \in I\) are continuous. If \( \mathcal{U} \subset E_{i}\) is open, let us see that \( \varphi_{i}(\mathcal{U})\) is open. Indeed :
and therefore it is an open set for the topology that we defined on \( \coprod_{i\in I}E_{i}\) .
If \( F\subset E_{j}\) is closed, then \( \left(A=\coprod_{i\in I}E_i\right)/\varphi_j(F)\) . This implies
and therefore \( A\) is open for the topology that we have defined on \( \coprod_{i\in I}E_i\) , which makes \( \varphi_j(F)\) closed. It follows from previous arguments that the \( \varphi_{j}\) form homeomorphisms on their image.
If \( f: \coprod\limits_{i\in I}E_i\to Y\) is continuous for all \( j \in I\) , then the composition \( f \circ \varphi_j\) is also continuous.
Suppose that \( \varphi_j\circ f\) is continuous forall \( j\in I\) and that \( V \subset Y\) is an open set. Then \( \varphi_j^{-1}(f^{-1}(V))\) is open in \( E_j \) , for all \( j \in I\) , which means that \( f^{-1}(V)\) is open in \( \coprod_{i\in I}E_i\) . So, \( f\) is continuous.
Ex. 1 & 7 The Cantor set is a metric space. The Cantor set is an uncountable set with Lebesgue measure zero. As the complement of a union of open sets, it is a closed subset of the real numbers and, consequently, a complete metric space. Furthermore, since it is totally bounded, the Heine–Borel theorem ensures that it is compact.
Ex.2 Let \( p\) be a given point of the space \( X\) . By hypothesis \( X\) is a \( \bf{ T}_2\) space. Therefore, it follows that every point \( x\neq p\) belongs to an open set \( G_x\) , that does not contain \( p\) and thus that
This implies that \( X\setminus\{p\}\) is open, in other words \( \{p\}\) is closed.
Ex. 3 Consider the following topological space:
We equip \( X\) with a topology defined as follows:
Let us check that \( X\) is a \( \bf{ T_1}\) Space. To show that \( X\) is \( \bf{ T_1}\) , we need to prove that every singleton set \( \{x\}\) is closed. Consider:
Therefore, every singleton is closed, and \( X\) is \( \bf{ T_1}\) .
Let us check if \( X\) is \( \bf{ T_2}\) .
To summarize,
Ex.4 is is enough to show that the set
is open. For every \( x\in A\) there exists in \( Y\) two open sets \( U\) and \( V\) such that \( f(x)\in U\) and \( g(x)\in V\) and their intersection is empty. The set given by \( f^{-1}(U)\cap g^{-1}(V)\) is the neighborhood of the point \( x\) and contained in \( A\) . Therefore, \( A\) is an open set.
Ex.5 Let us consider the example of a non-Baire space: the set of rational numbers with the subspace Topology.
The set of rational numbers is countable. In particular, we can write
Each singleton \( \{q\}\) is closed in \( \mathbb{Q}\) (subspace topology) but has empty interior: thus, \( \{q\}\) is nowhere dense. Therefore, \( \mathbb{Q}\) is a countable union of nowhere dense sets and \( \mathbb{Q}\) cannot be a Baire space.
Ex.6 The real numbers \( \mathbb{R}\) , equipped with the standard Euclidean topology, forms a Baire space.
A space is Baire if the intersection of countably many dense open subsets is dense. By the Baire Category Theorem, every complete metric space is a Baire space. Therefore, the conclusion follows.
Ex.8 This is an application of the Arzela–Ascoli theorem.
Ex.9 No. This is due to the fact that \( GL_n(\mathbb{R})\) splits into two disjoint subsets: the set of matrices with strictly positive determinant and the set of matrices with strictly negative determinant.
Ex.10 The curve \( y=\sin\frac{1}{x}\) on the interval \( (0,1]\) is connected but not locally connected nor arc-connected. As \( x\to 0^+\) , \( \lim_{x\to 0^+}\frac{1}{x}\) grows without bound, causing \( \sin\frac{1}{x}\) to oscillate infinitely between the values -1 and 1. By definition, a space is locally connected if every neighborhood of any point contains a connected open neighborhood. In the case of \( y=\sin\frac{1}{x}\) , no matter how small a neighborhood \( U\) of \( (0,y)\) is chosen, the graph within \( U\) consists of infinitely many disconnected components (due to the oscillations). Thus, there is no connected open neighborhood around \( (0,y)\) .
Ex.11 The first quartic has eight non-compact connected components, while the second has only one. This can be shown by leveraging the fact that these quartics are invariant under the Coxeter group \( CB_3\) , which corresponds to the symmetries of the cube. By decomposing space into Coxeter chambers, the quartic can be analyzed within a single fundamental domain. Applying an appropriate change of variables, \( x_i^2\mapsto X_i\) where \( i\in \{1,2,3\}\) within this domain reduces the problem to studying degree-2 surfaces, defined in a cone contained in a positive octant (for instance the cone delimited by the equations \( \{x_1=0\},\{x_2=0\},\{x_3=0\}\) and \( \{x_i=x_j\}\) where \( i\neq j\) . Since the classification of quadrics is well understood, this approach simplifies the analysis. See [21] for an entire classification of such quartics.
Ex.12 The universal covering space of the torus is the Euclidean plane \( \mathbb{R}^2\) , with the covering map: \( \pi:\mathbb{R}^2\to T^2,\) \( \pi(x,y)=(\exp{2\pi\imath x},\exp{2\pi\imath y})\) .
The Kähler torus (that is a torus equipped with a Kähler structure) admits a universal covering. In fact, the universal covering of a Kähler torus is the same as the universal covering of a standard torus, because the Kähler structure does not affect the underlying topology. The map \( \pi\) projects each point \( z\in \mathbb{C}^n\) to its equivalence class \( z+\Lambda\) in the quotient space \( \mathbb{C}^n/\Lambda\) . This is a smooth, holomorphic map that respects the complex and Kähler structures. The space \( \mathbb{C}^n\) is simply connected because it is contractible.
Ex.13 The Hawaiian earring is a classic example of a space that is not locally simply connected. It consists of infinitely many circles of decreasing radius, all tangent to a single point. While each circle is itself simply connected, the entire space fails to be locally simply connected at the point of tangency because any neighborhood of that point contains infinitely many circles, making it impossible to find a simply connected neighborhood.
Ex.14 Fundamental groups.
Ex.15 There are many examples. We can take the example of the deleted comb space. The comb space is a subspace of \( \mathbb{R}^2\) which looks like a comb. A comb space is defined by the set:
where \( K=\{\frac{1}{n} | n\in\mathbb{N}^*\}\) . However, the aim of this exercise is to create your own example, using your imagination.

A topological manifold is a fundamental object in mathematics, serving as an abstraction of spaces that locally resemble Euclidean space but exhibit additional properties. Manifolds provide the natural setting for geometry and topology, forming the foundation for different disciplines in mathematics.
Formally, an \( n\) -dimensional topological manifold is a Hausdorff topological space that is locally homeomorphic to \( \mathbb{R}^n\) . This means that around every point, there exists a neighborhood that behaves like an open subset of the Euclidean space, allowing us to use an Euclidean space type of intuition, while still permitting the existence of global structures (namely curvature, holes, or any other topological feature).
Topological manifolds can be studied with some extra structures:
The simplest examples of topological manifolds include familiar spaces such as the circle \( S^1\) , the sphere \( S^n\) the torus \( T^n\) , as well as more sophisticated objects like projective spaces.
Definition 34 (Topological Manifolds)
Let \( \mathcal{M}\) be a set, and let \( (\mathcal{E}_i)_{i\in I}\) be a family of Hausdorff topological vector spaces. Consider a collection \( \mathcal{A}\) of pairs \( \{(\mathcal{U}_i, \varphi_i)\}_{i \in I}\) , indexed by some set \( I\) , where each \( \mathcal{U}_i\) is a subset of \( \mathcal{M}\) and each \( \varphi_i\) is a mapping associated to it, subject to the following axioms:
The sets \( \mathcal{U}_i\) form an open cover of \( \mathcal{M}\) :
Each \( \varphi_i\) is a bijection between \( \mathcal{U}_i\) and an open subset of \( \mathcal{E}_i\) :
For each pair of indices \( i, j \in I\) , the transition maps
define homeomorphisms between the respective open subsets of \( \mathcal{E}_i\) and \( \mathcal{E}_j\) .
The data \( (\mathcal{M}, \mathcal{A})\) thus defines a topological manifold modeled on the spaces \( \mathcal{E}_i\) .
Definition 35 (Chart)
Let \( \mathcal{M}\) be a topological manifold. Each pair \( (\mathcal{U}_i, \varphi_i) \in \mathcal{A}\) , where \( \mathcal{U}_i\) is an open subset of \( \mathcal{M}\) and \( \varphi_i\) is a mapping of \( \mathcal{U}_i\) onto an open subset of a topological vector space, is called a chart (or coordinate system) of \( \mathcal{M}\) .
If a point m \( \in \mathcal{M}\) lies in \( \mathcal{U}_i\) , we say that \( (\mathcal{U}_i, \varphi_i)\) is a chart at m.
Definition 36 (Atlas)
An atlas on a manifold \( \mathcal{M}\) is a collection of charts \( \{(\mathcal{U}_i, \varphi_i)\}_{i \in I}\) such that:
The sets \( \mathcal{U}_i\) cover \( \mathcal{M}\) :
The transition maps
are homeomorphisms for all \( i, j\) for which \( \mathcal{U}_i \cap \mathcal{U}_j \neq \emptyset\) .
It is natural to assume that the topology on \( \mathcal{M}\) is given a priori. In many cases, one requires that each chart \( \varphi_i\) be a homeomorphism onto its image. As a first consequence of this definition, we obtain the following structural result:
Proposition 2
One can give to \( \mathcal{M}\) a topology in a unique way such that each \( \mathcal{U}_i\) is open, and the \( \varphi_i\) are topological isomorphisms.
Remark 1
The condition that \( \mathcal{M}\) is Hausdorff, is not necessary. This condition plays no role in the formal development of manifold.
However, in practical applications, \( \mathcal{M}\) is Hausdorff.
Exercise 16
Prove that the Special Linear group \( SL_n( R )\) is a smooth submanifold of \( \mathbb{R}^{n\times n}\) .
Exercise 17
Is the set \( \{{x,y}\in \mathbb{R}^2 | xy=0\}\) a submanifold of \( \mathbb{R}^2\) ?
\( \star\) In what follows, we impose the Hausdorff condition on all manifolds under consideration. Furthermore, any construction performed subsequently—such as products, tangent bundles, or fibered structures—will be required to yield spaces that remain Hausdorff. This ensures a well-behaved topological framework in which geometric operations preserve separation properties.
In the formulation given above, particularly in condition \( \mathcal{M} 2\) , no global assumption was imposed on the structure of the topological vector spaces \( \mathcal{E}_i\) used as local models. In particular, we did not require that all \( \mathcal{E}_i\) be the same, nor that there exist continuous linear isomorphisms between them.
However, one of the most natural and frequently encountered cases in practice is that in which there exists a fixed topological vector space \( \mathcal{E}\) such that each \( \mathcal{E}_i\) is isomorphic to \( \mathcal{E}\) , allowing a uniform model for the local structure of \( \mathcal{M}\) . In such a setting, one can regard \( \mathcal{M}\) as being locally modeled on a single space \( \mathcal{E}\) , which provides a more rigid but also more manageable framework for various constructions.
This remark leads us to the notion of modeled manifold:
Definition 37 (Modeled manifold)
Let \(\mathcal{M}\) be a topological space. An \(\mathcal{E}\)-atlas on \(\mathcal{M}\) consists of a collection of charts \(\{(U_i, \varphi_i)\}_{i \in I}\), where:
If an \(\mathcal{E}\)-atlas is given, we say that \(\mathcal{M}\) is an \(\mathcal{E}\)-modeled manifold (or simply an \(\mathcal{E}\)-manifold).
If the transition maps (i.e. maps of the type \(\varphi_j \circ \varphi_i^{-1}\)) satisfy additional compatibility conditions (such as differentiability, smoothness, or analyticity), then the \(\mathcal{E}\)-manifold is endowed with some extra structure.
In the following, we primarily focus on modeled manifolds whose model space is a vector space that admits a well-defined differentiable structure. This is the case for manifolds modeled on Banach spaces or Hilbert spaces, where a differentiable structure can be naturally defined. However, we are also interested in a generalization of normed spaces, namely locally convex topological vector spaces.
Example 10
A Hilbert manifold is a manifold modeled on a Hilbert space, which is a complete inner product space. Hilbert manifolds generalize finite-dimensional smooth manifolds to infinite dimensions.
Transition Maps: Given a pair of charts \( (U_a,\phi_a)\) and \( (U_b,\phi_b)\) , the transition map
is a smooth map between open subsets of \( H\) .
Exercise 18
Consider a smooth finite-dimensional manifold \( M\) . The loop space \( LM\) is the space of all smooth maps:
where \( S^1\) is the unit circle. Is the loop space \( LM\) a modeled manifold?
Exercise 19
Let \( M\) and \( N\) be finite dimensional smooth manifolds. Let \( C^k(M,N)\) be the space of \( C^k\) -differentiable maps from \( M\) to \( N\) . Prove that this space can be given the structure of a Banach manifold modeled on a Banach space.
Let us recall that a locally convex space \( X\) is a vector space over \( \mathbb{K}\) , a (sub)field of the complex numbers (it can be \( \mathbb{C}\) itself or \( \mathbb{R}\) for instance).
A locally convex space is defined either in terms of convex sets or equivalently in terms of seminorms. In fact, a topological vector space \( X\) is said to be locally convex if it verifies one of the following two equivalent properties. We start with the convex set definition.
Definition 38 ( Convex sets definition)
A topological vector space \( X\) is said to be locally convex if there exists a neighborhood basis (that is, a local base) at the origin, consisting of balanced convex sets.
We elaborate on those two notions of convexity and of balanced sets.
A subset \( C\in X\) is called convex if for all \( x,y\in C\) , and \( 0\leq t\leq 1\) we have
A convex subset \( C\in X\) is called absorbent if for every \( x\in X\) there exists \( r>0\) such that
for all \( t\in \mathbb{K}\) , and \( |t|>r\) . The set \( C\) can be scaled out by any "large" value to absorb every point in the space.
In any topological vector space, every neighborhood of the origin is absorbent.
A second possible viewpoint on this notion can be achieved using seminorms. A seminorm on \( X\) is a function
such that:
Non-negativity: \( p\) is nonnegative or positive semidefinite i.e.
Scaling property: \( p\) is positive homogeneous or positive scalable:
So, in particular, \( p(0)=0\) ,
Subadditivity: \( p\) is subadditive and it satisfies the triangle inequality:
Definition 39
A topological vector space \( X\) over a field \( \mathbb{C}\) or \( \mathbb{R}\) is said to be locally convex if there exists a family \( \mathcal{P}\) of seminorms on \( X\) . Let \( \{p_i\}_{i\in I}\) be a family of semi-norms on \( X\) , where \( I\) is an index set. Each semi-norm \( p_i:X\to \mathbb{R}\) satisfies the properties of a semi-norm (non-negativity, absolute homogeneity, and subadditivity).
Remark 2
Although the definition in terms of a neighborhood base gives a better geometric picture and intuition the definition in terms of seminorms is easier to work with, in practice.
Exercise 20
Consider the space of continuous functions \( C(\mathbb{R})\) on \( \mathbb{R}\) . Verify whether the family of functions \( \{p_n\}_{n\in \mathbb{N}}\) , given by:
forms a family of seminorms. Is \( C(\mathbb{R})\) locally convex? Do the \( \{p_n(f)\}\) generate a locally convex topology?
As mentioned earlier, both definitions are equivalent. We outline a sketch of the proof showing this equivalence.
\( \star\) The equivalence of those two definitions follows from a construction known as the Minkowski functional or Minkowski gauge. The key feature of seminorms which ensures the convexity of their \( \varepsilon\) -balls is the triangle inequality.
For an absorbing set \( C\) such that: if \( x\in C\) then \( t\cdot x\in C\) , whenever \( 0\leq t\leq 1\) , let us define the Minkowski functional of \( C\) to be
From this definition, it follows that \( \mu _{C}\) is a seminorm if \( C\) is balanced and convex (it is also absorbent by assumption).
Conversely, given a family of seminorms, the sets
form a base of convex absorbent balanced sets.
Let us mention some interesting properties.
If \( p\) is positive definite (which states that if \( p(x)=0\) then \( x=0\) ) it implies that \( p\) is a norm.
\( \star\) While in general seminorms need not be norms, there is an analogue of this criterion for families of seminorms and separatedness, defined below.
A locally convex topological vector space is a topological vector space in which every neighborhood of \(0\) contains an open neighborhood \(U\) of \(0\) such that, for all \(x, y \in U\) and \(0 \leq t \leq 1\),
This property turns out to be essential in the context of the implicit function theorem, for Banach spaces.
The latter result, known as the Banach Space Implicit Function Theorem, can be stated as follows:
Let \(X\), \(Y\) and \(Z\) be Banach spaces and let \(\mathcal{U}\) be an open subset of \(X \times Y\). Suppose that \(f : \mathcal{U} \to Z\) is a continuously differentiable (\(C^1\)) mapping such that \(f(a, b) = 0\) for some \((a, b) \in \mathcal{U}\) and the partial derivative \(D_y f(a, b)\) is a linear isomorphism from \(Y\) onto \(Z\).
Then, there exists:
such that
In other words, the implicit equation \( f(x,y)=0\) has for \( x\in W\) a solution \( y=g(x)\) of class \( C^1\) such that \( (x,y)\in \mathcal{V}\) .
This solution is unique in an open set \( \mathcal{W}'\subset \mathcal{W}\) .
\( \star{263C}\) Warning! It is important to note that not every locally convex topological vector space admits a differentiable structure in a meaningful way. Nevertheless, a particular class of locally convex topological vector spaces, known as Fréchet spaces, plays a fundamental role in various areas of mathematics, such as statistics and the theory of partial differential equations.
A Fréchet space \( X\) is defined as a locally convex, metrizable, and complete topological vector space, meaning that every Cauchy sequence in \( X\) converges to a point in \( X\) .
For general normed vector space \( X,Y\) , the Fr
specialChar{39}echet directional derivative of a function \( f:\mathcal{U}\to Y\) , where \( \mathcal{U}\) is an open subset of \( X\) , exists at \( x\in \mathcal{U}\) if there exists a bounded linear operator \( A:X\to Y\) such that
The Fréchet derivative in finite-dimensional spaces is the usual derivative. it is represented in coordinates by the Jacobian matrix.
Ex. 18 The loop space \( LM\) is modeled on the Hilbert space \( H=L^2(S^1,\mathbb{R}^n)\) , where \( n=dim(M)\) . Each loop \( \gamma\) can be locally approximated by functions in \( L^2(S^1,\mathbb{R}^n)\) . For a fixed loop \( \gamma\in LM\) , a chart around \( \gamma\) can be constructed using the exponential map on \( M\) . For a small neighborhood \( U\) of \( \gamma\) , the chart maps \( U\) to an open subset of \( H\) .
Ex. 19 This space can be given the structure of a Banach manifold, modeled on a Banach space of sections of a vector bundle. The model space is the Banach space \( C^k(M,\mathbb{R}^n)\) where \( n=dim(N)\) . For a fixed map \( f\in C^k(M,N)\) , a chart around \( f\) can be constructed using the exponential map on \( N\) . Specifically, for a small neighborhood \( U\) of \( f\) , the chart maps \( U\) to an open subset of \( C^k(M,\mathbb{R}^n)\) .
Ex. 20. Yes, each \( p_n\) forms a seminorm. The family \( \{p_n\}_{n\in \mathbb{N}}\) generates a locally convex topology on \( C(\mathbb{R})\) . In this topology, a sequence of functions \( f_k\) converges to \( f\) if and only if \( p_n(f_k-f)\to 0\) , for every \( n\in \mathbb{N}\) .
Ex. 16. This exercise is slightly more advanced since this statement can be shown using the Regular Value Theorem, which we have not considered in this book. We will use this exercise as a possibility to mention and illustrate this theorem!
The Regular Value Theorem states that if
is a smooth map between two smooth manifolds and \( y\in N\) is a regular value of \( f\) , then \( f^{-1}(y)\) is a smooth submanifolds of \( M\) . In our case, we have \( M=Mat_n(\mathbb{R})=\mathbb{R}^{n^2}\) , \( N=\mathbb{R}\) and \( f=\det\) . One needs to prove t hat 1 is a regular value of \( \det\) . This can be done by showing that for every matrix \( A\in Sl_n(\mathbb{R})\) the derivative \( d(\det)_A\) is surjective.
We now compute the derivative of the determinant. The derivative of the determinant function at a matrix \( A\in Mat_n(\mathbb{R})\) is given by the following formula:
where \( adj\) stands for the classical adjoint of the square matrix \( A\) (it is defined as the transpose of the cofactor matrix of \( A\) ) and \( tr \) is the trace operator. If \( A\in Sl_n(\mathbb{R})\) then the formula becomes
since \( adj(A)=A^{-1}\) if \( A\in Sl_n(\mathbb{R})\) .
Finally, the last step is to show the surjectivity of the derivative. In other we need to demonstrate that for any real number \( c\in \mathbb{R}\) , there exists a matrix \( H\in Mat_n(\mathbb{R})\) such that:
An easy example of such matrices is given for \( H=cA\) . Indeed,
Therefore, \( d(\det)_A\) is surjective for all \( A\in Sl_n(\mathbb{R})\) .
Ex. 17. The equation \( f(x,y)=xy=0\) in the Euclidean plane \( \mathbb{R}^2\) describes a union of the two coordinate axes: the \( x\) -axis (corresponding to \( y=0\) ) and \( y\) -axis corresponding to (\( x=0\) ). This set is not a submanifold of \( \mathbb{R}^2\) in the usual sense of differential geometry.
Observe that there exists a singular point at \( (0,0)\) . The set
consists of two connected components (the \( x\) -axis and the \( y\) -axis), which intersect precisely at the origin. Intuitively, a smooth submanifold of \( \mathbb{R}^2\) must locally resemble a smooth curve (or a single point). Away from the origin, the set \( S\) consists of a pair of smooth lines, but the origin, the submanifold condition fails.
To verify this, we compute the partial derivatives of the function \( f(x,y)=xy\) :
At \( (0,0)\) , both derivatives vanish:
This confirms that (0,0) is indeed a singular point.
This illustrates an example of a singular algebraic variety, where the smooth parts (the axes away from the origin) are manifolds, but the overall structure is not a manifold due to the singularity at the origin.
In this chapter, we undertake a comprehensive investigation of the notion of differentiability, within an extended framework. This expands beyond elementary calculus to encompass both differentiable functions and the rich structure of differentiable manifolds. Building upon this foundation, we systematically develop the essential constructions of tangent vectors and tangent spaces, alongside their dual counterparts, cotangent spaces. These concepts not only underpin the analytical machinery of differential geometry but also enable far-reaching applications across mathematics—from topology and dynamical systems—and physics, particularly in the geometric formulation of classical mechanics, general relativity, and gauge field theories.
More precisely, the notion of differentiability serves in mathematics in:
Inscribing ourselves in the vein of the previous chapter, the notion of Gateaux’s directional derivation appears the most natural to start with. We introduce this notion below.
Let \( \mathcal{M}\) be a manifold modeled on a topological vector space \( \mathcal{E}\) , and let us assume that a differentiable structure can be defined on \( \mathcal{E}\) . As we have shown \( \mathcal{E}\) to be a Hausdorff (locally convex vector space) one can define a differentiable structure via the Gateaux directional derivation.
Suppose that \( X\) and \( Y\) are locally convex topological vector spaces. Assume \( U \subset X\) is open and that we have the map \( F: X \to Y\) . The Gateaux differential \( dF(x;\varphi )\) of \( F\) at \( x \in U\) in the direction \( \varphi\) in \( X\) is defined as
(2)
If this limit exists for all \( \varphi\) , then \( F\) is said to be Gateaux differentiable in \( x\) .
The limit is taken relatively to the topology of \( Y\) .
Indeed:
This short introduction to Gateaux’s derivative guides us towards the more standard terminology of differentiable manifolds.
We start with a definition on differentiable manifolds.
Definition 40 (Differentiable manifold)
A \(\mathcal{C}^k\)-differentiable manifold \(\mathcal{M}\) is a topological manifold where the condition \((\mathcal{M}3)\) is substituted by a new condition:
for each pair of indices \(i, j \in I\), the transition map between overlapping coordinate charts,
is of class \(\mathcal{C}^{k}\), meaning it is \(k\)-times continuously differentiable. Moreover, for every \(i, j \in I\), the set \(\varphi_i(U_i \cap U_j)\) is open in the model space \(\mathcal{E}\).
Moreover, we have the following compatibility criterion.
Definition 41 (Compatible atlas)
Two \( \mathcal{C}^{k}\) atlases on \( \mathcal{M}\) , modeled on \( \mathcal{E}\) are said to be compatible if their union is an another such atlas.
Remark that this notion of compatibility is in fact an equivalence relation.
Exercise 21
Prove the remark above.
Definition 42 (Admissible atlas)
Given a manifold \( \mathcal{M}\) , all atlases, modeled on \( \mathcal{E}\) , and lying within the same equivalence class are said to be admissible on \( \mathcal{M}\) .
It is enough to have an admissible atlas to define the structure of a manifold.
Definition 43 (Differentiable mappings)
Consider a pair of \( \mathcal{C}^k\) -differentiable manifolds, denoted \( \mathcal{M}\) and \( \mathcal{M}'\) .
Notice, that these definitions do not depend on the chart.
Remark 3
Since the implicit function theorem does not hold for arbitrary locally convex space, this definition has a limited utility if we do not introduce more information on the topology.
Interesting properties are obtained in the case where the model space \( \mathcal{E}=\mathfrak{B}\) is a Banach vector space a natural differentiability structure is provided by the Banach derivation (Label E:Bdif not found.). In this case the implicit function theorem allows to derive interesting properties.
Definition 44 (Banach manifold)
A \( \mathcal{C}^k\) -Banach manifold \( \mathcal{B}\) is a manifold such that condition \( \mathcal{M} 3\) is replaced by:
\( \bullet\) \( \mathcal{M} 3''.\) The map
is a \( \mathcal{C}^{k}\) -isomorphism on \( \mathfrak{B}\) for each pair of indices \( i,j\) , and for any \( i,j\in I\) \( \varphi_i(\mathcal{U}_i\cap \mathcal{U}_j)\) is open in \( \mathcal{B}\) .
Definition 45 (Riemann manifold)
A Riemannian manifold is a differentiable manifold modeled on a real vector space \( \mathcal{E}\) , the topology of which is given by a scalar product \( \langle\cdot|\cdot\rangle\) .
Let us recall what a scalar product is. A scalar product on the vector space \( \mathcal{E}\) is a bilinear symmetric form
where:
The bi-linearity property is satisfied. For any scalars \( \alpha, \beta\) and for any \( x,x_1,x_2,y,y_1,y_2\in \mathcal{E}\) the following holds:
The scalar product is symmetric. For any \( x,y \in \mathcal{E}\) :
The scalar product is positive definite. For any \( x,y\in \mathcal{E}\) :
A Riemann manifold is said to be a Banach manifold for the norm:
In the case where the model space is a Hilbert space, we speak of Hilbert manifold. Hence a Riemann manifold is a real Hilbert manifold.
A particularly interesting class of Riemannian manifolds is the manifold endowed with the model space \( \mathcal{E} = \mathbb{R}^{n} \). The usual topology and differentiability structure on \( \mathbb{R}^{n} \) induce a differentiable structure on \( \mathcal{M} \) , making it a real \( n \) -dimensional differentiable manifold.
In the following, by an \( n\) -dimensional (real) manifold \( \mathcal{M} \) , we mean a Hausdorff topological space in which every point has a neighborhood homeomorphic to \( \mathbb{R}^n\) .
Definition 46 (\( n\) -dimensional differentiable manifold)
An \( n\) -dimensional real manifold of class \( C^k\) is a manifold modeled on \( \mathbb{R}^{n}\) and such that the condition \(\mathcal{M} 3\) is replaced by the following property. For each pair of indices \( i,j\in I\) , the map:
is a \( \mathcal{C}^{k}\) -isomorphism on \( \mathbb{R}^{n}\) .
Furthermore, for any pairs of indices \( i,j\in I\) the image of the intersection \( \mathcal{U}_i\cap \mathcal{U}_j\) given by \( \varphi_i(\mathcal{U}_i\cap \mathcal{U}_j)\) , under the coordinate map \( \varphi_i\) , is an open subset of \( \mathbb{R}^{n}\) .
A natural way to describe the structure of an \( n\) -dimensional differentiable manifold is through local coordinate systems. These coordinate systems are given by charts, which map open subsets of the manifold to open subsets of \( R ^n\) , allowing us to analyze the manifold locally as if it were a Euclidean space.
Definition 47 (Local coordinate system)
Let \( (\mathcal{U},\varphi)\) be a chart at the point m \( \in \mathcal{M}\) , such that the image \( \varphi\) ( m)\( =x=(x^1,\dots,x^n)\in\mathbb{R}^{n}\) . Given a basis \( \{e_{1},\dots,e_{n}\}\) of the Euclidean space \( \mathbb{R}^{n}\) , the coordinates \( (x^1,\dots,x^n)\) of the image \( \varphi\) ( m) \( \in\mathbb{R}^{n}\) of m \( \in U\subset\mathcal{M}\) are called the coordinates of m in the chart \( (\mathcal{U},\varphi)\) . The chart \( (\mathcal{U},\varphi)\) is also called the local coordinate system.
Local coordinate systems provide a way to describe differentiable manifolds in terms of open subsets \( \mathbb{R}^{n}\) , allowing us to define smooth functions and analyze local geometric properties. In particular, this enables the construction of tangent spaces, which capture the local linear structure of the manifold at each point. The dual spaces to these, known as cotangent spaces, naturally arise when considering differential forms and gradients of functions, playing a fundamental role in differential geometry and analysis on manifolds.
To grasp the essence of the tangent and cotangent spaces, we begin with the most intuitive objects on a manifold: curves. A curve provides a way to move infinitesimally along the manifold, and by examining how functions change along such curves, we arrive at the concept of directional derivatives. These, in turn, will guide us toward a precise definition of tangent vectors. Finally, this will naturally lead us to the construction of a tangent space, the fundamental linear structure underlying the local geometry of the manifold.
Definition 48 (Curve)
A curve \( \gamma:[a,b]\to \mathcal{M}\) on a Manifold \( \mathcal{M}\) is a \( C^p\) -map, where \( p\geq 1\) mapping an interval \( \mathcal{I}=[a,b]\subset \mathbb{R}\) into \( \mathcal{M}\) .
The curve is said to be smooth if it is a \( C^\infty\) map.
Definition 49 (Tangent vector to a curve)
Let \( \gamma:t\mapsto \gamma(t)\) be a curve of class \( C^1\) such that:
Then, the tangent vector \( u_0\) at \( \gamma\) in \( {\scriptstyle M}_0\) is defined by
(3)
Definition 50 (Directional derivative)
Let \( \mathcal{A}({\scriptstyle M}) \) be the family of \( C^1 \)-functions defined on a neighborhood of \( {\scriptstyle M} \in \mathcal{M} \). Assume \( f \in \mathcal{A}({\scriptstyle M}) \).
Consider a curve \( \gamma \) of class \( C^1 \),
such that at some fixed parameter value \( t_0 \), we have
Then, the directional derivative of \( f \) in the direction of the curve \( \gamma \) at \( t_0 \) is defined as
(4)
More generally:
Definition 51 (Derivation)
A derivation at a point \( {\scriptstyle M}\) on \( \mathcal{M}\) is a linear functional
where the Leibniz rule holds:
(5)
(6)
One can associate, via the formula (4), at each point \( {\scriptstyle M}\in \mathcal{M}\) a tangent vector \( u\) .
Definition 52 (Tangent vector)
A tangent vector \( X_{\scriptstyle M} \) at a point \( {\scriptstyle M} \in \mathcal{M} \) on a differentiable manifold \( \mathcal{M} \) is a linear map
where \( \mathcal{A}({\scriptstyle M}) \) denotes the space of functions defined and differentiable in some neighborhood of \( {\scriptstyle M} \in \mathcal{M} \).
This map satisfies the Leibniz rule: for any \( f, g \in \mathcal{A}({\scriptstyle M}) \),
(7)
Notice that two differentiable functions \( f_{1}\) and \( f_{2}\) which coincide on a neighborhood of \( {\scriptstyle M}\) have the same tangent vector. Therefore, a tangent vector at \( {\scriptstyle M}\) is the same for all functions belonging to the class of differential functions, coinciding on a neighborhood of \( {\scriptstyle M}\) .
The class of functions which coincide on a neighborhood of \( {\scriptstyle M}\) is called a “germ” of \( f\) . A germ forms an algebra under the sum and the pointwise product. A tangent vector (also called contravariant vector) is a derivation on the algebra of germs of differentiable functions at \( {\scriptstyle M}\) .
Definition 53 (Tangent space)
The space of all tangent vectors at \( {\scriptstyle M} \in \mathcal{M} \), equipped with the addition and scalar multiplication defined by
(8)
forms a real vector space, called the tangent space at \( {\scriptstyle M} \), and denoted by \( \mathcal{T}_{\scriptstyle M}(\mathcal{M}) \).
Let \( \gamma\in\mathcal{A}({\scriptstyle M})\) be a curve such that \( \gamma(t_0)={\scriptstyle M}_0\) and let \( u_{0}=\frac{d\gamma}{dt} \vert_{t_{0}}\) be the tangent vector at \( {\scriptstyle M}_0\) along \( \gamma\) . A tangent vector \( X_{{\scriptstyle M}_0}\) at \( {\scriptstyle M}_0\in \mathcal{M}\) in the direction \( u_{0}\) can be written
(9)
In the case of an \( n\) -dimensional real manifold \( \mathcal{M}\) . Let \( (\mathcal{U},\varphi)\) be a chart at the point \( {\scriptstyle M}\in \mathcal{M}\) and let \( x=(x^1,\dots,x^n) \in \varphi(\mathcal{U})\subset \mathbb{R}^n\) be a local coordinate system. A tangent vector at \( X_{{\scriptstyle M}_0}\) is the set \( \{ X_{{\scriptstyle M}_0}(\varphi^{i}) \}_{i=1}^{n}\) , called the local coordinates of the tangent vector, where:
and where \( \pi^i\) is the projection operateur in \( \mathbb{R}^{n}\) .
Moreover, for all \( C^1\) -functions on a neighborhood of \( {\scriptstyle M}\in \mathcal{M}\) ,
(10)
The equation (10) can be written in the abbreviated form:
(11)
Remark 4
Any other basis (or frame) can be obtained from the local coordinate basis. Let \( \{e_{i}\}_{i=1}^{n}\) be a basis of \( T_{\scriptstyle M}( \mathcal{M})\) with:
where \( \Phi_{k}^{i}\) is the matrix corresponding to an invertible linear transformation. We have
the \( X^{k}_v\) being the component of \( X_{\scriptstyle M}\) in the basis \( \{e_{i}\}_{i=1}^{n}\) .
Tangent and cotangent spaces are dual vector spaces, linked by a canonical pairing. We discuss now the notion of cotangent vector spaces.
Definition 54 (Cotangent vector space)
Let \( \mathcal{M}\) be a manifold. The dual space \( \mathcal{T}^\star_{\scriptstyle M}(\mathcal{M})\) to the tangent vector space \( \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) in \( {\scriptstyle M}\in \mathcal{M}\) is the space of linear forms on \( \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) . It is a vector space called the cotangent vector space to \( \mathcal{M}\) at \( {\scriptstyle M}\) .
The elements of \( \mathcal{T}^\star_{\scriptstyle M}(\mathcal{M})\) are called cotangent vectors, or covariant vectors, or covectors, or differential 1-forms.
Let \( \omega_{\scriptstyle M} \in \mathcal{T}^\star_{\scriptstyle M}(\mathcal{M})\) be a differential 1-form and consider \( X_{\scriptstyle M} \in \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) . Then:
(12)
In the case of an \( n\) -dimensional real manifold, a useful canonical isomorphism between a space and its dual can be chosen as follows.
Let \( (e_1,e_2,\dots,e_n)\) be a basis in \( \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) . We may construct its dual \( (\varepsilon^1,\varepsilon^2,\dots,\varepsilon^n)\) by
(13)
where \( X^i_{\scriptstyle M}\) is the component of \( X_{\scriptstyle M}\) in the basis \( \{e_j\}\) and
(14)
If we have chosen the natural basis \( \{e_{i}=\partial/\partial x^i\}_{i=1}^{n}\) then the dual basis is denoted by \( \{\epsilon^{i}=dx_{i}\}_{i=1}^{n}\) with
(15)
a tangent vector \( X_{\scriptstyle M}\in \mathcal{T}_{\scriptstyle M}\) takes the form
whereas a cotangent vector \( \omega_{\scriptstyle M} \in \mathcal{T}_{\scriptstyle M}^\star(\mathcal{M})\) in the dual basis is given by
For a real manifold of finite dimension, we have the following equality:
(16)
Let \( \mathcal{M}\) a \( n-\) dimensional real manifold. A chart \( (\mathcal{U},\varphi)\) with a local coordinate system \( \varphi:{\scriptstyle M}\to \varphi({\scriptstyle M})=x=(x^1,\dots,x^n)\in \mathbb{R}^n\) at \( M \in \mathcal{M}\) being given, the local coordinate basis on the tangent space is given by
and its dual basis is
Had we chosen a different chart, say \( (\mathcal{U}',\varphi')\) with \( \varphi': {\scriptstyle M}\to x'=({x'}^1,\dots,{x'}^n)\) at \( {\scriptstyle M}\in \mathcal{M}\) , a local coordinate basis can be given by \( \{e'_{i}=\partial/\partial {x'}^{1}\}_{i=1}^{n}\) as well as its dual basis \( \{{\epsilon'}^{i}=d{x'}^{i}\}_{i=1}^{n}\) . We can certainly express one local coordinate basis in terms of the other on the open set \( \varphi(\mathcal{U})\cap\varphi'(\mathcal{U}')\) .
Let us break this statement down. Under a given change of a coordinates, the local coordinates of a tangent vector are described as follows:
So, the vector transformation law is given by:
(17)
Similarly, for the covector we have
(18)
Having explored the implications of a changing the chart for vectors and covectors, a logical next step is to establish a rigorous definition of a function’s differential.
Definition 55
Let \( f\) be a differentiable function. The differential \( df|_{\scriptstyle M}\) of \( f\) on \( \mathcal{M}\) in the neighborhood of a point \( {\scriptstyle M}\) is defined by the 1-form
(19)
In the natural basis
More generally, let \( \{e_{i}\}_{i=1}^{n}\) be a basis in \( \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) and let \( \{\epsilon_{i}\}_{i=1}^{n}\) be its dual basis. The value of \( df|_{\scriptstyle M}\in \mathcal{T}^{\star}_{_{M}}(\mathcal{M})\) at \( X_{\scriptstyle M}\in\mathcal{T}_{\scriptstyle M}(\mathcal{M})\) is given by
Hence,
(20)
Having previously established the fundamental geometric notions of vectors, differential forms (linear maps on vectors), along with differentials of functions (a type of 1-form), we naturally arrive to the notion of tensors, which encode multilinear maps between these objects.
For simplicity, in this section, we assume that the manifold is finite dimensional.
Definition 56
Given a point \( {\scriptstyle M}\in \mathcal{M}\) , a tensor \( T^{(r,s)}_{\scriptscriptstyle M}\) of type \( (r,s)\) is a multi-linear form on the space \( \underbrace{ \mathcal{T}_{\scriptstyle M}\times \dots \times \mathcal{T}_{\scriptstyle M}}_{r}\times \underbrace{\mathcal{T}_{\scriptstyle M}^\star \times\dots \times \mathcal{T}_{\scriptstyle M}^\star }_{s}= \mathcal{T}_{\scriptstyle M}^{\otimes r}\times {\mathcal{T}_{\scriptstyle M}^\star}^{\otimes s},\)
defined by
Furthermore, this object adheres to the following properties.
for scalars \( \alpha,\beta \in \mathbb{R}\) and all \( 1\leq j\leq r\) as well as \( 1\leq k\leq s\) .
Moreover, if \( T_{\scriptscriptstyle M}\) and \( S_{\scriptscriptstyle M}\) are two tensor of the same type \( (r,s)\) , then
(21)
and tensors of a given type \( (r,s)\) span a linear vector space of dimension \( n^{r+s}\) .
The following denominations are often used in the literature.
Definition 57 (Tensor product)
Let \( T_{\scriptscriptstyle M}\) be a tensor at the point \( \scriptstyle M\in \mathcal{M}\) of type \( (r,s)\) and let \( S_{\scriptscriptstyle M}\) be a tensor of type \( (p,q)\) . Then, the tensor \( T_{\scriptscriptstyle M}\otimes S_{\scriptscriptstyle M}\) defined by
(22)
is called the tensorial product of \( T_{\scriptscriptstyle M}\) and \( S_{\scriptscriptstyle M}\) .
Admitting this definition, an \( (r,s)\) -tensor \( T\) can be seen as the tensorial product of \( r\) covariant vectors and \( s\) tangent vectors at \( \scriptstyle M \in \mathcal{M}\) as we can see:
where the \( T_{(i)}\) are covectors and the \( T^{(k)}\) are contravectors.
BY abuse of notation, we have omitted the index \( \scriptscriptstyle M\) to have a more convenient formula. However, it is important to remember that the definition is local.
To conclude, the space of tensors of type \( (r,s)\) can be identified with the tensorial product space:
Let us choose a basis of \( T_{_{M}}^{\star}(\mathcal{M})^{\otimes r} \otimes T_{_{M}}(\mathcal{M})^{\otimes s }\) given by the \( n^{(r+s)}\) vectors:
(23)
where \( \{\epsilon^{i}\}_{i=1}^{n}\) is the dual basis of \( \{e_{j}\}_{j=1}^{s}\) .
If we work in the framework of local coordinate basis the associate basis of the \( (r,s)\) -tensor space is
then the \( (r,s)\) -tensor \( T\) is given by:
(24)
To enhance the clarity, it is advantageous to adopt the Einstein summation convention, where repeated upper and lower indices are implicitly summed over. In the following formula, we illustrate this convention.
(25)
where the components are given by,
(26)
Under a change of coordinates, an \( (r,s)\) -tensor transforms via \( r\) applications of the Jacobian (for covariant indices) and \( s\) applications of its inverse (for contravariant indices):
(27)
We leverage this framework to study symmetries of tensors. The study of tensor symmetries within this framework reveals fundamental insights into their invariant properties and multilinear algebraic behaviour.
Let us first consider a (covariant) tensor of order \( r\) . Let \( S_{r} \) be the group of permutations of the \( r\) integers \( \{1,2,\dots,r\}\) . By definition, it acts on the tangent space at \( \scriptstyle M \in \mathcal{M}\) in the following way:
or in local coordinates
so that \( T\) has the symmetry defined by \( \sigma\) .
Definition 58
\( \bullet\) If \( \sigma T=T\) then the tensor \( T\) is said to be symmetric.
\( \bullet\) If \( \sigma T=sign(\sigma)T\) , with \( sign(\sigma)=\pm1\) (depending on whether the permutation is even or odd) the tensor \( T\) is said to be antisymmetric.
One can define a symmetrization operator \( S\) as well as an antisymmetrization operator \( A\) on the (covariant) tensor \( T\) . This is done below:
(28)
which in local coordinates is defined by
(29)
where we have used the Einstein summation rule and
is the Kronecker tensor.
The symmetry properties of a contravariant tensor is defined similarly.
Remark 5
The permutation \( \sigma\) permutes only the labels of objects of the same nature. For a \( (r,s)\) -tensor, the symmetry and antisymmetry properties are defined only for indices of the same nature. It is usual to use square brackets \( [\cdot]\) for the antisymmetry and round brackets (\( \cdot\) ) for symmetry. For example, the \( (4,3)\) -tensor \( T_{[abc]d}^{\phantom{[abc]d}(ef)g}\) is antisymmetric in \( a,b,\) and symmetric in \( e,f\) .
In the language of modern geometry, fiber bundles serve as a unifying framework, extending classical notions such as vector bundles to a broader setting. They naturally arise in various mathematical and physical contexts, providing a structural backbone for spaces with local product structures.
A principal application appears in Gauge Theory, where principal bundles furnish a natural setting for describing gauge fields in physics, including the Yang-Mills theories. Here, the connection on a bundle encodes the dynamics of fundamental interactions, and curvature expresses field strength in a geometrically intrinsic way.
In topology, fiber bundles play a fundamental role in the theory of characteristic classes, such as the Chern classes, which assign global topological invariants to vector bundles. These invariants capture deep global properties of manifolds, serving as indispensable tools in the study of topology, geometry, and mathematical physics.
The notion of bundle have been introduced to generalize topological product.
Definition 59 (Fiber bundle)
A fiber bundle is a triple \( (\mathcal{B},\mathcal{M},\pi)\) consisting of two topological spaces \( \mathcal{B}\) and \( \mathcal{M}\) and a continuous surjective map
where \( \pi\) is called the projection of the total space \( \mathcal{B}\) onto the base space \( \mathcal{M}\) .
The topological space \( \mathcal{F}_{\scriptstyle M}= \pi^{-1}({\scriptstyle M}), {\scriptstyle M}\in \mathcal{M}\) is called the fibre at \( {\scriptstyle M}\in \mathcal{M}\) .
Example 11
The simplest example of bundle is the product bundle defined by \( (E_{1}\times E_{2},E_{1},\pi) \) where \( E_{i}\) is isomorphic to \( \mathbb{R}\) and with \( \pi(x,y)=x\) , for all \( x \in E_{1}\) and \( y\in E_{2}\) . This is illustrated by the following diagram.

So, in this example the total space is given by the topological product \( \mathcal{B}= E_1 \times E_2\) and the base space is \( \mathcal{M}=E_1 \) . The projection map is given by \( \pi(x,y)=x\) , where we project \( E_1 \times E_2\) onto \( E_1\) . Naturally, we could have chosen to project on \( E_2\) .
The need to generalize topological products can be seen on the following examples.
a) The cylinder is obtained by taking the topological product \( S^{1}\times \mathcal{I}\) , where \( \mathcal{I}\) is a segment. The fiber bundle is defined by the triple \( (S^{1}\times \mathcal{I},S^{1},\pi)\) , where here we have chosen to define the projection map as follows \( \pi:S^{1}\times \mathcal{I} \to S^1\) .

b) The M
specialChar{34}obius band obtained by twisting a sheet and then gluing the opposite edges forms only a local topological product and is no longer a topological product. It can be done locally: for an open subset \( U\subset S^1\) , the topological product \( U \times \mathcal{I}\) describes a segment of the M
specialChar{34}obius band, but does not take under account the twisting operation.
In the following we restrict ourselves to the case where the topological space \( \mathcal{F}_{\scriptstyle M}= \pi^{-1}({\scriptstyle M}), {\scriptstyle M}\in \mathcal{M}\) are homeomorphic to a space \( F\) , called typical fibre.
Definition 60 (Trivial bundle)
A trivial bundle \( \mathcal{B}\) is a fiber bundle homeomorphic to a product bundle \( \mathcal{M}\times \mathcal{F}\) and \( \pi \) is just the projection from the product space to \( \mathcal{M}\) .
Definition 61 (Vector bundle)
If \( \mathcal{F}_M\) is a vector space, \( (\mathcal{B},\mathcal{M},\pi)\) is called vector bundle.
Definition 62 (G-bundle)
A fiber bundle \( (\mathcal{B},\mathcal{M},\pi, G)\) is made of:
Definition 63 (Principal fiber bundle)
A \( G\) -bundle \( (\mathcal{B},\mathcal{M},\pi, G)\) in which the typical fiber \( \mathcal{F}\) and the structural groupe \( G\) are isomorphic by left translation is called a principal fiber bundle.
Definition 64 (Vector bundle)
If \( \mathcal{F}\) is a vector space, \( (\mathcal{B},\mathcal{M},\pi)\) is called vector bundle.
A vector bundle is a \( G\) -bundle with \( G\) the linear group of \( \mathcal{F}\) .
Definition 65 (Cross-section of a bundle)
A cross-section of the bundle \( (\mathcal{B},\mathcal{M},\pi,G)\) is a mapping \( \sigma: \mathcal{B} \to \mathcal{M}\) such that \( \sigma \circ \pi = I_\mathcal{M}\) the identity in \( \mathcal{M}\) .
Theorem 12
A principal fiber bundle \( (\mathcal{B},\mathcal{M},\pi, G)\) is trivial if and only if it is a continuous cross-section.
Proof
First assume that \( \mathcal{B}\) has a cross-section \( \sigma: \mathcal{M} \to \mathcal{B}\) . Then
Given \( p\in \mathcal{F}\) , there exists a unique \( g_0\) such that \( p= L_{g_0}=g_0\sigma(x)\) . Then,
is a homomorphism which preserves the group structure of the fibers
In particular, \( \phi_\sigma(f(x)=(x,e)\) , where \( e\) is the identity of \( G\) . This shows that the existence of a continuous cross-section implies the triviality.
Conversely, if the principal bundle is trivial i.e. \( \mathcal{B}= \mathcal{M} \times G\) , then
defined by \( x\mapsto (x,k(x))\) (where \( k : X \to G\) is some continuous mapping) forms a cross-section of \( \mathcal{B}\) .
The notion of tangent spaces can be upgraded to the structure of a tangent bundle. We describe this in the next section below.
Definition 66 (Tangent bundle)
A tangent bundle \( (\mathcal{T}(\mathcal{M}),\mathcal{M},\pi,G)\) is a bundle with fibers being the tangent spaces.
In the case of a \( n\) -dimensional manifold, the tangent-bundle \( (\mathcal{T}(\mathcal{M}), \mathcal{M}, \pi,G)\) can be identified to the natural bundle defined by
Let \( \mathcal{M}\) a \( n\) -dimensional manifold. A frame \( \phi_{\scriptscriptstyle M}\) associated with the tangent bundle \( \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) of \( \mathcal{M}\) is a set of \( n\) linearly independent vectors \( \{f_{1},\dots,f_{n}\}\) which can be expressed as a linear combination of a particular basis \( \{e_{1},\dots,e_{n}\}\) of \( \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) , such that
This implies that there exists a bijection between the set of all frames in \( \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) and the group \( GL(n,\mathbb{R})\) .
Definition 67 (Frame bundle)
Let us consider the collection
and \( \mathcal{M}\) is equipped with a differentiable structure. Then, the four-tuple
\( (\Phi(\mathcal{M}),\mathcal{M},\pi, GL(n,\mathbb{R}))\) with typical fiber \( GL(n,\mathbb{R})\) and structural group \( GL(n,\mathbb{R})\) is called the frame bundle on \( \mathcal{M}\) .
Furthermore, a frame bundle associated with a vector bundle forms a principal \( GL(n,\mathbb{K})\) -bundle, where \( \mathbb{K}\) is a field of characteristic 0 such as \( \mathbb{R}\) or \( \mathbb{C}\) . The general linear group acts freely and transitively on the frames via basis changes.
The topology on the fiber bundle associated to a vector bundle \( E\) is constructed using local trivializations of \( E\) . Each trivialization induces a bijection between the fiber over an open set \( U_i\) and \( U_i\times GL(n,\mathbb{K})\) , with the final topology ensuring compatibility across overlapping regions. The fibers of the frame bundle are \( GL(n,\mathbb{K})\) -torsors. A torsor (or principal homogeneous space) for a Lie group \( G\) is a homogeneous space for \( G\) in which the stabilizer subgroup of every point is trivial. A principal homogeneous space for a group \( G\) is a non-empty set on which \( G\) acts freely and transitively.
Exercise 22
Prove that when \( \mathcal{M}\) is equipped with a Riemannian metric the structure group reduces to the orthogonal group \( O(n)\) .
Exercise 23
Prove that when \( \mathcal{M}\) is a \( 2n\) -dimensional symplectic manifold then it has a natural \( Sp(2n,\mathbb{R})\) -structure.
We remark the following fact, relating the manifold structure and the frame bundle.
A natural manifold structure can be given if we notice that the open sets of the typical fiber \( GL(n,\mathbb{R})\) are in bijection with the open sets of \( \mathbb{R}^{n^{2}}\) , and that the structural group of diffeomorphisms of the typical fiber onto itself is simply \( GL(n,\mathbb{R})\) .
Given a differentiable manifold \( \mathcal{M}\) , we consider at each point \( {\scriptstyle M}\) of \( \mathcal{M}\) its corresponding tangent space \( \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) . A vector field \( X\) assigns to every point in \( {\scriptstyle M}\in \mathcal{M}\) a tangent vector \( X_{\scriptstyle M}\in \mathcal{T}_\mathcal{M}\) in a smooth manner. This means that if \( f\) is a smooth function on \( \mathcal{M}\) , the map \( p\mapsto X_{\scriptstyle M}(f)\) must also be smooth.
In more concrete words, given a subset of the Euclidean space \( \mathbb{R}^n\) , a vector field is represented by a vector-valued function.
More fundamentally, a smooth vector field \( X\) is a linear map \( X:C^{\infty}(\mathcal{M})\to C^{\infty}(\mathcal{M})\) where \( C^{\infty}(\mathcal{M})\) is the algebra of smooth functions and where \( X\) satisfies the Leibnitz rule. This property reveals the deep algebraic nature of vector fields: they form a Lie algebra under the Lie bracket, defined as
with \( f\in C^{\infty}(\mathcal{M})\) and \( X,Y\) are smooth vector fields on \( \mathcal{M}\) .
To make a link between the previous section on tangent bundle, let us highlight that a vector field on \( \mathcal{M}\) can be defined in terms of a section of the tangent bundle. This is stated in a more precise way in the following definition.
Definition 68 (Vector field)
A vector field \( X\) on a manifold \( \mathcal{M}\) is a cross-section of the tangent bundle \( \mathcal{T}(\mathcal{M})\) .
Using this approach, we can say that a vector field \( X\) associates to each \( {\scriptstyle M} \in \mathcal{M}\) a tangent vector \( X_{\scriptstyle M} \in T_{\scriptstyle M}(\mathcal{M})\) by the mapping \( X: {\scriptstyle M} \mapsto ({\scriptstyle M}, X_{\scriptstyle M})\) .
A vector field \( X \in \mathcal{T}(\mathcal{M})\) act on differentiable function \( f\) on \( \mathcal{M}\) by
(30)
or in local coordinates,
(31)
where \( X^i\) are functions such that in a chart \( (\mathcal{U},\varphi)\) in the neighborhood of \( M\) by \( X_{\scriptstyle M}^{i}=(X(\varphi))({\scriptstyle M})\) , is called the component of \( X\) with respect to the local coordinates \( x^i=\varphi^{i}({\scriptstyle M})\) .
Definition 69 (\( \mathcal{C}^{r}\) -vector field)
By definition, a vector field on a \( \mathcal{C}^{k}\) -manifold \( \mathcal{M}\) is \( \mathcal{C}^{r}\) -differentiable if the mapping \( \mathcal{M} \to T(\mathcal{M})\) is \( \mathcal{C}^{r}\) -differentiable, where \( r\leq k-1.\)
Going back to out first definition, we can refine it for \( \mathcal{C}^{k}\) -differentiable functions of a vector field \( X\) on a \( \mathcal{C}^{k}\) -manifold \( \mathcal{M}\) , can be understood as a derivation on the algebra \( \mathcal{C}^{k}(\mathcal{M})\) of function of class \( \mathcal{C}^{k}\) on \( \mathcal{M}\) :
Let us introduce the space \( \mathfrak{T}(\mathcal{M})\) of all \( \mathcal{C}^{\infty}\) vectors fields on \( \mathcal{M}\) , that is the space of all \( X(f)\) which are differentiable for all \( \mathcal{C}^{\infty}\) functions \( f\) on \( \mathcal{M}\) . Under the addition and the multiplication operation:
this forms a module on the ring \( C^{\infty}(\mathcal{M})\) , but cannot be an algebra for the product of vector fields.
Note A ring is a set \( \mathfrak{R}\) with two internal laws, addition and multiplication. For the addition \( \mathfrak{R}\) is an abelian group, the multiplication is associative and distributive with respect to addition.
A module \( \mathfrak{T}\) over the ring \( \mathfrak{R}\) is an abelian group together with an external operation, called scalar multiplication such that
The product \( XY\) of two vector fields defined by \( (XY)(f)=X(Y(f))\) does not satisfy the Leibnitz rule. Indeed,
Therefore \( XY\) does not form a vector field.
However, notice that the Lie bracket \( [\cdot,\cdot]\) defined by
(32)
does form a vector field. The multiplication, defined by the lie bracket is :
not associative but satisfies the Jacobi identity:
(33)
Lemma 1
The set \( \mathfrak{T}(\mathcal{M})\) is a Lie algebra for the Lie bracket.
Proof
The proof is mostly described above.
Definition 70 (Moving Frame)
A set of \( n\) linearly independent differentiable vector fields \( \{e_{i}\}_{i=1}^{n }\) which form a basis of the module \( \mathfrak{T}(U)\) where \( U\subset \mathcal{M}\) is called a moving frame.
Notice that a moving frame may not exist globally on \( T(\mathcal{M})\) .
To end this section, let us mention that vector fields can be used for instance in Index Theory. The Poincaré–Hopf theorem relates zeros of vector fields on \( \mathcal{M}\) to the Euler characteristic of \( \mathcal{M}\) .
The topic considered in this section relates to cotangent bundles. A cotangent bundle is the natural dual to the tangent bundle, encoding the differential structures of a manifold in their most intrinsic form. While the tangent bundle \( \mathcal{T}(\mathcal{M})\) describes directions of motion, the cotangent bundle \( \mathcal{T}^*(\mathcal{M})\) is the realm of differentials.
For a differentiable manifold \( \mathcal{M}\) the cotangent space at each point consists of linear functionals on the tangent space. That is, if \( X_{\scriptstyle M}\) is a tangent vector, an element of \( \mathcal{T}_{\scriptstyle M}^*(\mathcal{M})\) assigns to it a real number by evaluating a differential:
Definition 71 (Cotangent bundle)
The bundle space \( (\mathcal{T}^\star(\mathcal{M}),\mathcal{M},\pi)\) where \( \mathcal{T}^\star(\mathcal{M})\) is the space of pairs \( ({\scriptstyle M},\omega_{\scriptstyle M})\) for all \( {\scriptstyle M}\in \mathcal{M}\) and all \( \omega_{\scriptstyle M}\in \mathcal{T}^\star_{\scriptstyle M}(\mathcal{M})\) is called cotangent bundle space.
In the case of a n-dimensional manifold, the cotangent-bundle \( (\mathcal{T}^{\star}(\mathcal{M}), \mathcal{M}, \pi,G)\) can be identified to the natural bundle
Definition 72 (Covector field)
A \( \mathcal{C}^k\) -differentiable 1-form \( \omega\) is a \( \mathcal{C}^k\) -cross section of the cotangent bundle. It is often called a covariant vector field.
A covariant vector field \( \omega\) associates to each \( {\scriptstyle M}\in \mathcal{M}\) a covariant vector \( \omega_{\scriptstyle M} \in X\in \mathcal{T}^{\star}_{\scriptstyle M}(\mathcal{M})\) by the mapping
The covector field \( \omega \in \mathcal{T}^{\star}(\mathcal{M})\) acts on the vector field \( X\in T(\mathcal{M})\) by
(34)
If we denote by \( \{dx^{i}\}_{i=1}^{n}\) the dual basis of the natural basis \( \{\partial_{i}=\partial/\partial x^{i}\}_{i=1}^{n}\) ,
and
where the index \( {\scriptstyle M}\in \mathcal{M}\) .
From the equation (20), we can define the differential 1-form field by:
(35)
In the natural basis
and in an arbitrary local basis \( \{e_{i}\}_{i=1}^{n}\) in \( \mathcal{T}_{\scriptstyle M}(\mathcal{M})\) , where \( \{\varepsilon_{i}\}_{i=1}^{n}\) is the dual basis, it implies that we have:
and finally
(36)
Assume \( \mathcal{M}\) and \( \mathcal{M}'\) are two manifolds. Let:
be a differentiable mapping between \( \mathcal{M}\) and \( \mathcal{M}'\) , such that \( {\scriptstyle M}\in \mathcal{M}\) is mapped to \( {\scriptstyle M'}=f({\scriptstyle M})\in \mathcal{M}'\) .
The map \( f\) induces a linear (Jacobian) mapping denoted \( f_{\star}\) , between the tangent space \( T_{\scriptstyle M}(\mathcal{M})\) at \( {\scriptstyle M}\) and the tangent space \( T_{\scriptstyle M'}(\mathcal{M}')\) at \( {\scriptstyle M'}=f({\scriptstyle M})\) ,
defined in the following way.
Let \( g\) a differentiable function in the neighborhood of \( f({\scriptstyle M})\) . Then, we obtain:
(37)
Theorem 13
Let \( f:\mathcal{M} \to \mathcal{M}'\) be a \( C^\infty\) diffeomorphism between two \( n\) -dimensional differentiable manifolds. Then \( f_{\star}\) is an isomorphism of the Lie algebra:
given by
(38)
Proof
Exercise!
Definition 73 (Invariant vector field)
A vector field \( X\) on \( \mathcal{M}\) is said to be invariant under the diffeomorphism
if
(39)
Let \( \omega_x\in T^{\star}_x(\mathcal{M})\) and \( \omega'_{\scriptstyle M'}\) be such that
The pull-back (reciprocal image) \( f^{\star}: \mathcal{T}^{\star}_{\scriptstyle M'}(\mathcal{M}') \to \mathcal{T}^{\star}_{\scriptstyle M} (\mathcal{M})\) of a covariant vector \( \omega'_{\scriptstyle M'}\) under the differentiable mapping \( f\) is defined by the equality:
(40)
Therefore, the pull-back \( f^\star: T^{\star}(\mathcal{M}') \to T^{\star}(\mathcal{M})\) of the 1-form \( \omega'\) under a differentiable mapping \( f\) is defined by
(41)
Remark 6
The expression of the reciprocal image of a 1-form does not involve \( f^{-1}\) , whereas it is the case for a vector field.
We generalise the discussion in the previous chapter about tensors. This notion also can evolve using the concept of fiber bundles.
In particular, a fiber bundle where the base space is the manifold \( \mathcal{M}\) and the fiber is identified with \( \otimes^{p}T^{\star}_{\scriptstyle M}(\mathcal{M})\otimes^{q}T_{\scriptstyle M}(\mathcal{M})\) at all \( {\scriptstyle M}\in \mathcal{M}\) is called a \( (p,q)\) -tensor bundle.
Notice that a tangent bundle is a \( (0,1)\) -tensor bundle and that the cotangent bundle is a \( (1,0)\) -tensor bundle.
Definition 74
A \( (p,q)\) -tensor field on a \( C^{k}\) -manifold is a \( C^{r}\) cross-section, where \( r\leq k-1\) of the \( (p,q)\) -tensor bundle.
Operations defined in section 2.3.5 for tensors at a point, are carried over fiber-wise allowing to define similar operations on tensor fields.
Under these operations, the set of all \( (p,q)\) -tensor fields of class \( \mathcal{C}^{r}\) is a module on the ring \( \mathcal{C}^{r}(\mathcal{M})\) .
In differential geometry, connections and parallel transport arise from a fundamental necessity: the ability to compare vectors at different points on a curved space, where a naive translation is no longer well-defined.

In Euclidean space, one can simply compare vectors via a parallel transport–shifting, without change in magnitude or direction.
However, on a curved manifold, the very act of moving a vector must be prescribed by an additional structure, which is provided by a connection.
A connection defines a rule for differentiation that respects the manifold’s geometry, replacing the notion of partial derivative in a given direction with a well-defined covariant derivative. This operator tells us how a vector field changes as we move along the manifold, encoding information about curvature and torsion.
But the true power of connections emerges in the concept of parallel transport. Given a connection, we can transport a vector along a curve, according to the manifold’s structure. This operation is at the heart of many geometric investigations.
In Riemannian geometry, parallel transport reveals the curvature of space, since moving a vector along different paths can lead to different final results. In fact, given a loop \( \gamma:[a,b]\to \mathcal{M}\) where \( [a,b]\subset \mathbb{R}\) on a smooth manifold \( \mathcal{M}\) , where \( \gamma(a)=\gamma(b)\) we can transport a vector \( \overrightarrow{v}\) along that curve, starting at the initial point \( \gamma(a)\) .
The question that emerges is whether there has been any modification implied on the vector \( \overrightarrow{v}\) , during this transport, namely does the vector at \( \gamma(b)\) coincide with the vector \( \overrightarrow{v}\) at \( \gamma(a)\) or not. If those vectors coincide then the manifold on is flat on the set where the loop was defined.
In gauge theory, parallel transport describes how fields interact, with connections playing the role of gauge potentials in Yang-Mills theory.
In general relativity, the Levi–Civita connection governs the motion of free-falling observers in curved spacetime.
Thus, a connection is not merely a technical tool but an essential ingredient of geometry, allowing us to define differentiation, curvature, and transport in a way that transcends local coordinates. It is the bridge between the infinitesimal and the global, between algebra and topology, between abstract geometry and the laws of nature.
In Euclidean space, two vectors of different origin are compared by a parallel translation of the vectors to the one same origin. The derivative of a vector \( \overrightarrow{v}\) defined along a curve, is a vector of components \( (\partial v^{i}/\partial x^{j}) dx^{j}/dt\) in the Cartesian coordinates. On an arbitrary differentiable manifold, the components
do not behave as the components of a tensor under a change of coordinates. To solve this difficulty we introduce the notion of covariant derivative which is of tensorial type.
Definition 75 (Linear connection)
A linear connection on a differentiable manifold \( \mathcal{M}\) is a mapping \( \nabla : X\mapsto \nabla X\) from the vector fields on \( \mathcal{M}\) into the differentiable tensor field of type \( (1,1)\) on \( \mathcal{M}\) such that
(42)
where \( f\) is a differentiable function on \( \mathcal{M}\) .
The tensor \( \nabla X \) is called the covariant derivative of \( X\) .
For a \( n\) -dimensional differentiable manifold \( \mathcal{M}\) , equipped with moving a frames (Defintion 70) \( \{e_{i}\}_{i=1}^{n}e_{i}\in T(\mathcal{M})\) and \( \{\varepsilon^{{i}}\}_{i=1}^{n}\varepsilon^{i}\in T^{\star}(\mathcal{M})\) (its dual), we have that \( \nabla e_{i}\) is a \( (1,1)\) -tensor field which can be written as:
(43)
and the \( (1,1)\) -tensor \( \nabla e_{i}\) can be expressed as:
(44)
Definition 76 (Connection coefficients)
The coefficients \( \gamma^{j}_{ki}=\nabla e_{i}(e_{k},\varepsilon^{j})\) are called the connection coefficients in the basis \( \{e_{i}\}_{i=1}^{n}\) .
It follows from the definition 75 and the equation (19) that
where \( \nabla_{k}X^{i}=\big (\nabla X \big)^{j}_{k}\) denote the component of the tensor \( \nabla X\) .
(45)
Furthermore, one can rewrite the \( (1,1)\) -tensor \( \nabla_X\) in the following way:
(46)
making the covariant part explicit.
Definition 77 (Connection forms)
The quantities defined by
(47)
are called the connection 1-forms.
We have the relation
(48)
As \( \nabla X\) is a \( (1,1)\) -tensor it is invariant under the change of natural coordinate system. In particular:
and
It follows from (46) that
Moreover,
and therefore
which, by equation (20), gives us
(49)
The connection coefficients are not components of a tensor because of the first term on the right hand side of the equation (49).
In the local coordinate system \( \{x^{1},\dots,x^{n}\}\) , we have:
(50)
where \( \Gamma^{i}_{k\ell}X^{\ell}\) is the connection coefficients (44) in the local (natural) coordinate system \( \{x^{1},\dots,x^{n}\}\) . For the vector field \( \partial_i=\frac{\partial}{\partial x^i}\) the components are all \( 0\) except for \( X^i=1\) , and we can express:
Definition 78 (Christoffel symbols)
The \( \Gamma^{j}_{ki}\) are called the Christoffel symbols.
(51)
Now, the Christoffel symbols transform under a change of local coordinates system: \( \{x^{1},\dots,x^{n}\}\to \{{x'}^{1},\dots,{x'}^{n}\}\) as
(52)
Hence, the Christoffel symbols does not define a tensor.
Let \( X\) and \( Y\) be two vector fields. By (50), we get \( \nabla X(Y)=Y^{k}\nabla_{k}X^{i}e_{i}=Y^{k} \nabla_{k}X\) which can be identified to the derivative (4) in the \( Y\) -direction
Let \( X\) and \( Y\) be two vector fields. By (50), they can be expressed as
which can be identified to the derivative (4) in the direction of \( Y\) . This leads us to the formulation of a definition of the covariant derivative.
Definition 79 (Covariant derivative in the direction \( Y\))
The covariant derivative \( \nabla_{_{Y}}X\) of \( X\) in the direction \( Y\) is defined by
(53)
The covariant derivative \( \nabla_{_{Y}}X\) is linear in \( Y\)
where
In the frame \( \{e_{i}\}_{i=1}^{n}\) , we obtain:
and thus
(54)
where \( \nabla_{e_{i}}\) denote the covariant derivative in the direction \( e_i\) .
The answer to this question is yes. The notion of a covariant derivative in the direction of \( X\in \mathcal{T}_{_{M}}(\mathcal{M})\) can be extended—in a neightborhood of \( M\in \mathcal{M}\) — to an arbitrary type of tensor field, under the following assumptions:
(55.a)
(55.b)
(55.c)
(55.d)
Therefore, if \( t\) is a \( (p,q)\) -tensor its covariant derivative forms a \( (q+1,p)\) -tensor:
where
We will study the above definition on the example of a 1-form. By the fourth condition above, we have that
which implies that
If we substitute \( X\) by \( e_{i}\) this gives
and thus
which means in particular that
(56)
This allows us to deduce that
Proposition 3 (Covariant derivative of 1-form)
Let \( \mathbf{\alpha}\in T^{\star}_{_{M}}(\mathcal{M})\) . Then
(57)
Additionally, if \( \{e_{i}\}_{i=1}^{n}\) is a moving frame and \( \{\varepsilon^{i}\}_{i=1}^{n}\) then
(58)
Proof
The proof is left as an exercise.
We now consider a (2,0)-tensor and explore the notion of covariant derivative of a (2,0)-tensor. Let \( g=g_{\mu\nu}\varepsilon^\mu\otimes \varepsilon^{\nu}\) be a \( (2,0)\) -tensor.
The covariant derivative of \( g\) is obtained from the properties (3) and (4) outlined in the list of conditions above.
Therefore,
or
(59)
Exercise 24
The generalization is rather straight forward and left as an exercise.
Exercise 25
As an example, one can work with a \( (2,1)\) -tensor \( t_{kl}^i\) and show that :
We continue with the next remark.
Remark 7
The covariant derivative in the direction \( e_{i}\) of the tensor product of \( s\) and \( t\) of two tensors satisfy, by (55.c):
However, it is important to keep in mind that the sum of two tensor products is defined only if the corresponding factors have the same rank and therefore:
In Euclidean space, vectors based at different points can be compared by translating them parallelly to a common origin.
Consequently, if a vector \( \overrightarrow{v}\) is transported parallelly along a curve \( \gamma\) , the derivative \( dv/dt=0\) vanishes.
On a manifold \( \mathcal{M}\) , to compare vectors at two different points \( M\in \mathcal{M}\) and \( M'\in \mathcal{M}\) , it is necessary to be able to assign a uniquely defined frame at \( M'\) based on a frame at \( M\) .
This is precisely where the notion of connection plays a key role: by enabling the existence of a covariant derivative, a connection provides a convenient notion of parallel transport.
Definition 80 (Parallel transport)
A vector \( X\) is said to be parallel along the curve \( \gamma : t\to \gamma(t)\) if
(60)
Remark 8
The vector \( u\) is defined only at points lying on the curve \( \gamma\) . Somehow, it can also be extended to a vector field on a neighborhood of the curve \( \gamma\) .
If \( (\varphi,U)\) is a local chart, the component of a point \( \gamma(t)\) on the curve \( \gamma\) are
and the components of \( u\) are given by \( u^{i}=dx^{i}/dt\)
We shortly digress on the notion of geodesics, as they intimately related to the topic discussed in this section.
Definition 81 (Affine geodesic)
An affine geodesic on \( \mathcal{M}\) is a curve
such that
(61)
for some function \( \lambda\) on \( \mathbb{R}\) .
The curve \( \gamma\) is called a geodesic i.e.
(62)
The concept of geodesics generalizes the notion of straight line in Euclidean space.
There are some easy computations that can be done given \( \gamma:I\to M, I=[a,b]\subset \mathbb{R}\) , the length of the smooth curve is given by
In particular, this notions starts to be interesting as soon as we want to understant better the intrinsic distant between two points on a manifold. Assume \( \mathcal{M}\subset \mathbb{R}^n\) is an \( m\) -dim smooth manifold. Take a pair of distinct points \( p,q\in \mathcal{M}\) . Their Euclidean distance is given by \( |p-q|\) in the ambient Euclidean space. However, this type of distance does not tell us much, from the viewpoint of the manifold \( \mathcal{M}\) . We therefore introduce the notion of an intrinsic distance in \( \mathcal{M}\) .
Definition 82
The intrinsic distance on \( \mathcal{M}\) between \( p, q\in \mathcal{M}\) is a real number \( d(p,q)\geq 0\) defined by
where \( \Omega_{p,q}\) is the space of smooth paths of the unit interval joining \( p\) to \( q\) .
Exercise 26
Prove that \( L(\gamma)\geq |p-q|\) .

Let us now write the equation of geodesics in local chart \( (\varphi,U)\) the component of a point \( \gamma(t)\) on the curve \( \gamma\) are \( x^{i} (t)=\varphi\circ\gamma(t)\) and the component of \( u\) are \( u^{i}=dx^{i}/dt\)
Let \( (\mathcal{M}, g)\) be a Riemannian manifold, that is a smooth manifold equipped with a Riemannian metric. The Riemannian curvature tensor is defined as a map
characterized by the formula:
where \( [X,Y]\) is a Lie bracket of vector fields and where \( \Gamma(\mathcal{T}\mathcal{M})\) are tangent bundle sections.
Example 12
In the flat case, the absence of curvature ensures geodesic parallelism.
In the scope of using a more modern language, we can consider the notion of Riemannian curvature in a more categorical framework, which relies on the notion of sheaf. To give a rough idea, a sheaf assigns some local data to open sets, and patches these local data together into a global object.
Definition 83
Let \( X\) be a topological space. A presheaf \( \mathcal{F}\) on \( X\) is a contravariant functor from the category of open sets of \( X\) (denoted \( Open(X)\) ), to a category \( \mathscr{C}\) , i.e.,
This definition means that to each open set \( U\subset X\) , we assign an object \( \mathcal{F}(U\) ) in the category \( \mathscr{C}\) . The category \( \mathscr{C}\) can be the category of sets, abelian groups, rings etc.
To each inclusion of open sets \( V\subset U\) , we assign a restriction morphism \( \rho_{U,V}:\mathcal{F}(U)\to \mathcal{F}(V)\) , satisfying:
A sheaf is a presheaf with an extra gluing condition. Namely, for any open covering \( U=\bigcup\limits_{\alpha} U_\alpha\) and a family of local sections that agree on overlaps (i.e. compatible local sections), there exists a unique global section that restricts to them. In particular, \( \mathcal{F}\) is a sheaf if, for any open cover \( U=\bigcup\limits_{\alpha} U_\alpha\) , the sequence is exact:
Let us breakdown this construction:
Going back to the definition of curvature, the main differences, between the previous language and the new formalism are depicted below:
Let \( (\mathcal{M}, g)\) be a Riemannian manifold, where \( \mathcal{M}\) is a smooth manifold equipped with a sheaf \( \mathcal{O}_{\mathcal{M}}\) of smooth functions and a sheaf \( \mathcal{E} = \mathcal{T}_{\mathcal{M}}\) of sections of the tangent bundle, endowed with a Riemannian metric \( g\) . The connection \( \nabla\) is then a morphism of sheaves of \( \mathcal{O}_{\mathcal{M}}\) -modules:
where \( \Omega^1_{\mathcal{M}}\) is the sheaf of Kähler differentials on \( \mathcal{M}\) .
The Riemannian curvature tensor is then a natural transformation of the \( \mathcal{O}_{\mathcal{M}}\) -module functor given by
defined by the relation
where \( [X,Y]\) is the Lie bracket of vector fields, arising from the structure of the Lie algebroid associated with \( \mathcal{E}\) . The operator \( [\nabla_X, \nabla_Y]\) is a commutator of differential operators.
Since the right hand side only depends on the values of \( X, Y, Z\) at a given point, \( R\) is a tensorial object, i.e., a section of the sheaf: \( R \in \Gamma(\mathcal{M}, \operatorname{End}(\mathcal{E}) \otimes \Omega^2_{\mathcal{M}})\) .
This tensor encodes the non-commutativity of the covariant derivative and measures the curvature of the category of \( \mathcal{O}_{\mathcal{M}}\) -modules endowed with \( \nabla\) .
Exercise 27
Show that we can rewrite it as:
Exercise 28
Symmetries: Show that we have the following relations:
(63)
Let \( \mathcal{M}\subset \mathbb{R}^n\) be a smooth \( m\) -dimensional submanifold. Let \( p\in \mathcal{M}\) and let \( E\subset \mathcal{T}\mathcal{M}\) be a 2-dimensional linear subspace of the tangent space. The sectional curvature of \( \mathcal{M}\) at \( (p,E)\) is the number
where \( u,v\in E\) are linearly independent and \( R_p\) is the Riemannian curvature tensor.
Exercise 29
Let us consider the manifold of symmetric positive definite matrices i.e.
Compute the sectional curvature for this manifold.
The space of symmetric positive definite matrices has many interesting applications.
Definition 84
Let \( k\in \mathbb{R}\) and \( m\geq 2\) be an integer. An \( m\) -manifold has constant sectional curvature \( k\) if and only if \( K(p,E)=k\) for every \( p\in \mathcal{M}\) and every 2-dimensional linear subspace \( E\subset \mathcal{T}_p\mathcal{M}\) .
Theorem 14
Let \( \mathcal{M}\) be an \( m\) -manifold. Fix an element \( p\in \mathcal{M}\) and a real number \( k\) . Then the following are equivalent: \( K(p,E)=k\) for every 2-dimensional linear subspace \( E\subset \mathcal{T}_p\mathcal{M}\) . The Riemann curvature tensor of \( \mathcal{M}\) at \( p\) is given by:
for all \( v_1,…, v_4 \in \mathcal{T}_p\mathcal{M}\) .
Let \( M\) be a smooth manifold, and let \( \nabla\) be an affine connection on \( M\) . The presence of torsion in \( \nabla\) quantifies the extent to which infinitesimal parallel transport fails to be symmetric.
Definition 85
The torsion tensor \( \boldsymbol{T}\) associated with a connection \( \nabla\) is the \( (1,2)\) -tensor defined by
for any vector fields \( X,Y\) on \( M\) , where \( [X,Y]\) denotes the Lie bracket and \( \nabla_Y (X)\) is the covariant derivative. The connection \( \nabla\) is said to be torsion-free if \( \boldsymbol{T}=0\) .
In local coordinates, \( \{x^i\}\) , the torsion tensor can be expressed in terms of the connection coefficients \( \Gamma_{ij}^k\) :
where \( \boldsymbol{T}^i_{jk}\) are the components of the torsion tensor, and \( \Gamma_{jk}^i\) are the Christoffel symbols of the connection.
In Riemannian geometry, the Levi-Civita connection is the unique torsion-free connection that is compatible with the metric. This connection is central to the study of curvature and geodesics.
In the presence of torsion, the equation of geodesic deviation (which describes how nearby geodesics spread apart or converge) includes additional terms involving the torsion tensor.
Intuitively, imagine two nearby geodesics on a surface. If the surface is flat (such as a plane), the geodesics are straight lines, and the distance between them remains constant. However, if the surface is curved (for instance, a sphere), the geodesics may converge or diverge over time. Geodesic deviation quantifies this behavior.
Let \( \mathcal{M}\) be a smooth manifold equipped with a sheaf \( \mathcal{O}_{\mathcal{M}}\) of smooth functions and a sheaf \( \mathcal{E} = \mathcal{T}_{\mathcal{M}}\) of sections of the tangent bundle. A Riemannian structure (or a semi-Riemannian structure in the case of Lorentzian manifolds) is given by a metric \( g: \mathcal{E} \otimes_{\mathcal{O}_{\mathcal{M}}} \mathcal{E} \to \mathcal{O}_{\mathcal{M}}\) .
A geodesic on \( \mathcal{M}\) is a path satisfying the equation
where \( \nabla\) is the Levi-Civita connection associated with \( g\) .
Now, consider a one-parameter family of geodesics, represented by the morphism
where \( I\) is an interval in \( \mathbb{R}\) parametrizing proper time or arc length, and the second parameter represents a perturbation of the initial geodesic. The geodesic deviation vector field is given by
which defines a section of \( \mathcal{E}\) along the central geodesic \( \gamma(s=0)\) . This vector field satisfies the Jacobi equation, which in sheaf-theoretic terms can be written as
Here, \( R: \mathcal{E} \times_{\mathcal{O}_{\mathcal{M}}} \mathcal{E} \to \operatorname{End}(\mathcal{E})\) is the Riemann curvature tensor, viewed as a section of the sheaf \( \operatorname{End}(\mathcal{E}) \otimes_{\mathcal{O}_{\mathcal{M}}} \Omega^2_{\mathcal{M}}\) .
In the case where \( (\mathcal{M}, g)\) is flat (i.e., \( \mathcal{M}\) is locally isometric to an open subset of \( \mathbb{R}^n\) with the Euclidean metric), we have \( R = 0\) , and hence \( \mathcal{J}\) satisfies
This implies that geodesics remain equidistant, meaning that nearby geodesics neither converge nor diverge.
If \( (\mathcal{M}, g)\) is curved, then the curvature tensor \( R\) introduces a term that governs the deviation of geodesics. The sign and structure of \( R(\dot{\gamma}, \mathcal{J}) \dot{\gamma}\) determine whether nearby geodesics converge or diverge, capturing the intrinsic curvature of \( \mathcal{M}\) in a functorial manner.
Thus, geodesic deviation measures the sheaf-theoretic failure of parallel transport to be trivial in the presence of curvature.
The torsion tensor is skew-symmetric in its lower indices:
If \( \boldsymbol{T}=0\) , the connection is said to be torsion-free. In this case, the connection coefficients satisfy
and the connection is symmetric.
The torsion tensor and the curvature tensor \( R\) of a connection are related via the Bianchi identity:
The First Bianchi Identity for a connection with torsion can be written as:
(64)
In components, this can be rewritten as:
For a more categorical approach, let \( \mathcal{E}\) be a sheaf of sections of the tangent bundle (or a vector bundle over a space), and let \( \nabla\) be a connection on \( \mathcal{E}\) . The curvature \( R^{\nabla}\) and torsion \( \boldsymbol{T}\) appear as components of a functorial construction on the Atiyah algebroid (the sheaf of first-order differential operators preserving a given structure).
In particular, using the Lie-algebroid formalism, the first Bianchi identity corresponds to the failure of the Spencer differential \( d_\nabla\) acting on torsion to be zero, that is:
(65)
where
Specifically, if you parallel transport a vector \( Y\) along a curve tangent to \( X\) ,
Example 13
The Torsion can be interpreted as an obstruction to the integrability of certain geometric structures, such as foliations or distributions on a manifold.
Example 14
The torsion plays a central role in non-Riemannian geometries. This includes Finsler geometry for instance and teleparallel gravity. In those frameworks, the torsion tensor is used to describe the gravitational field, replacing the curvature tensor of general relativity.
In this chapter we give a short overview on probabilistic and statistic notions. We will see that everything that has seen introduced so far can be used in the context of probability and statistics, namely information theory. Such a mixture gave birth to the theory of information geometry.
We will also recall some notions of distance or divergence between probabilities, which is important for the “learning process”.
When studying a mathematical problem in statistics, several key pieces of information come into play:
In this context, the first item is associated with a collection of inputs gathered from the sample space \( \Omega\) . The second item involves the probability distribution \( P[\cdot]\) , which comes from a family of distributions \( {P_{\theta}}\) , where the parameter \( \theta\) belongs to the parameter space \( \Theta\) . Overall, this transforms the issue of mathematical statistics into a decision-making problem, as one must determine which pieces of information are most beneficial. Formally, every statistical problem relates to a measurable space \( (\Omega, \mathbf{S})\) , which captures the outcomes of the observed phenomenon, a family of probability distributions \( {P_{\theta}}\) , and potentially additional a priori knowledge about the unknown phenomenon linked to another measurable space \( (\Omega', \mathbf{S'})\) , which may concern hypotheses or actions taken.
The basic notion in probability theory is the one of random experiment. Such an experiment is mathematically described by a probability space characterized by the triplet \( (\Omega,\mathbf{S}, P)\) where:
Definition 86 (\( \sigma\) -algebra)
A family \( {\mathbf{S}}\) of subsets of a sample space \( \Omega\) is a \( \sigma\) -algebra defined by the following algebraic structures:
the pair \( (\Omega,{\mathbf{S}})\) is called measurable space.
Examples 2
Exercise 30
Show that the \( \sigma\) -algebra generated by \( n\) disjoint subsets of a set \( \Omega\) has cardinality \( 2^n\)
Definition 87
A probability \( P\) is the following map
which satisfies the following properties:
From this definition, we can obtain have useful relations:
To illustrate the construction of probability spaces, let us discuss the following examples.
Example 15
A fixed coin is flipped independently \( n\) times. The sequence of outcomes \( (\omega_1, \dots, \omega_n)\) , consisting of heads \( H\) and tails \( T\) , might look like
To determine whether a coin is biased, we need to analyze the probability of obtaining heads \( H\) and tails \( T\) over multiple independent flips.
The question is whether we can assume that the coin in question is not biased.
Equivalently, we can ask: does the probability of obtaining heads \( H\) deviate from \( \frac{1}{2}\) by less than \( 10^{-3}\) ?
Now, consider the following situation:
Example 16
The coin is flipped independently, but the probability distribution governing the outcomes is unknown. The only available information is a constant parameter \( \theta \) .
We do know, however, that the probability of observing any specific sequence \( \omega\) with \( k(\omega\) ) heads \( H\) follows the Bernoulli probability law (this is the Bayesian approach):
Consequently, the probability distribution \( P[\cdot]\) is given by
where
and
This defines a parametric curve in the \( (n - 1)\) -dimensional simplex of all probability distributions on the sample space \( \Omega\) , where \( n=2^N \) .
Now, suppose we impose a lexicographical order on all possible sequences of \( \omega\) ’s. Then, the number \( k({\omega}_j) \) corresponds to the number of zeros in the binary representation of the integer \( j - 1 \) .
To end our discussion on the example of the flipping coin, we observe the following.
Example 17
For \( N=1\) , we obtain \( n=2^1=2\) . The unknown probability distribution \( P[\cdot]\) is represented by a “curve” with parametric equation
for all probability distributions on \( \Omega\)
in the \( 1\) -dimensional simplex, that is on the segment.
For \( N=2\) , we obtain \( n=4\) . The probability distribution \( P[\cdot]\) is represented by a curve with parametric equation:
where
and
In this case, the parametric curve is a curve in the \( 3\) -dimensional probabilistic simplex.
We explain how this curve behaves in relation to the representation of words in letters \( H\) and \( T\) .
We begin with \( \theta=0\) , which corresponds to the vertex \( P[TT]=1\) . Then, the curve goes through the center of our simplex, since we have \( P[HH]=P[HT]=P[TH]=P[TT]=\frac{1}{4}\) and where \( \theta =\frac{1}{2}\) . The endpoint of the curve is given by the vertex \( P[HH]\) which corresponds to \( (\theta=1)\) .
More generally, the flipping coin example can be re-contextualized as follows.
Example 18
For finite probabilities, we may use the multinomial distribution, with sample space \( \Omega= \{\omega_1,\dots,\omega_m\}\) and \( \sigma\) -algebra generated by \( {\mathbf{S}}= {\mathcal P}(\Omega)\) , which is the set of subsets of \( \Omega\) generated by the elementary one element sets \( \{\omega_i\}\) . The probability is defined by
(66)
with
and this induces
(67)
Example 19
In the case of countably infinite probabilities, let \( \Omega=\{\omega_1,\omega_2,\dots\}\) and let the \( \sigma\) -algebra be generated by the elementary sets \( \{\omega_i\}\) . The probability is defined by its value on these sets such that:
(68)
A typical example of such probability distribution is the Poisson distribution,
(69)
In the case of continuous probability distributions with respect to the Lebesgue measure, we consider the case where the sample space is identified to \( \mathbb{R}\) with the Borel \( \sigma\) -algebra \( {\mathbf{S}}\) being the subset of \( {\mathcal P}(\Omega)\) generated by the intervals \( (-\infty,a],\) where \( a\in {\mathbb R}\) . The probability is then given by
(70)
with
A classical example is the Gaussian distribution:
(71)
Exercise 31
Prove the following theorem on \( \sigma\) -Algebras and subcovers. Let \( (X, \mathcal{A}, \mu)\) be a measure space, and let \( \mathcal{B} \subseteq \mathcal{A}\) be a sub-\( \sigma\) -algebra of \( \mathcal{A}\) . If \( \{ A_n \}_{n \in \mathbb{N}}\) is a countable cover of \( X\) by sets in \( \mathcal{A}\) , then there exists a subcover in \( \mathcal{B}\) if and only if the measure \( \mu\) on \( \mathcal{A}\) is uniquely determined by its restriction to \( \mathcal{B}\) .
In many contexts, it is beneficial to consider a notion of measure that extends beyond the confines of probability theory. Unlike a probability measure–which by definition assigns a total measure of 1 to the entire space–a general measure does not require such normalization. By relaxing the normalization condition, we can define and work with measures that better reflect the intrinsic properties of the space under consideration. This generalization opens up new possibilities for both theoretical developments and practical applications, ranging from quantum field theory to ergodic theory and beyond.
Definition 88
A measure \( \lambda\) on the measurable space \( (\Omega,{\mathbf{S}})\) is a map, which maps events to real positive numbers such that:
A measure \( \lambda\) is \( \sigma\) -finite if
Remark 9
We draw the readers attention upon the fact that a \( \sigma\) -finite measure is in general not finite. An example is the Lebesgue measure \( \lambda\) on the real line, which is finite on all bounded interval \( \lambda[ a,b] = \mid b-a\mid\) , but not finite.
Exercise 32
If \( X\) is a metric space and \( A\) and \( B\) are two disjoint closed subsets of \( X\) , then there exists a continuous function \( f(x)\) on \( X\) with properties:
1) \( 0\leq f(x)\leq 1\)
2)
Let us first consider a random experiment described by the probability space \( (\Omega,\mathbf{S},P)\) . A real random variable is an application from the sample space \( \Omega\) to a subspace \( E\in \mathbb{R}\) which preserves the algebra of events.
Definition 89 (Mesurable function)
Let \( (E, \mathcal{B})\) a mesurable space, a mesurable function \( f\) from \( (\Omega,\mathbf{S})\) to \( (E, \mathcal{B})\) is such the inverse image of a mesurable set in \( \mathcal{B}\) is a mesurable set in \( \mathbf{S}\)
Definition 90 (Random variable)
Let \( (\Omega, \mathbf{S},P)\) a probability space, a \( E\) valued random variable \( X\) is the class of \( P\) -ae (\( P\) -almost everywhere, i.e. up to set of \( P\) -measure \( 0\) ) measurable function from \( (\Omega,\mathbf{S},P)\) to \( (E, \mathcal{B})\) such the inverse image of a measurable set in \( \mathcal{B}\) is a measurable set in \( \mathbf{S}\)
Let Let \( (\Omega, \mathbf{S},P)\) a probability space, the integral over \( \Omega\) , if there exist, of a random variable \( X\) is called the expectation (or mean) of \( X\) , and denoted
That is \( \mathbb{E}_P[X]\) exist if \( X\in L(\Omega,\mathbf{S},P)\) .
Definition 91
If \( \mu \) and \( \nu\) are two measures on the same measurable space \( (\Omega,\mathbf{S})\) , \( \mu \) is said to be absolutely continuous with respect to \( \nu \) if \( \mu ( A)=0\) for every set \( A\) for which \( \nu (A)=0\) . This is written as "\( \mu \ll \nu \) ". That is: \mu\ll \nu \text{ if and only iffor all } A \in \mathbf{S},(\nu (A)=0 \Leftarrow \mu (\mathcal{A})=0). \]
When \( \mu \ll \nu \) ,then \( \nu \) is said to be dominating \( \mu \) .
Absolute continuity of measures is:
If \( \mu \ll \nu \) and \( \nu \ll \mu\) , the measures \( \mu \) and \( \nu \) are said to be equivalent. Thus absolute continuity induces a partial ordering of such equivalence classes.
Exercise 33
As a concrete example of a dominating measure, let \( \{P_k\}\) be a finite or countable family of probabilities on the same measurable space \( (\Omega,\mathbf{S})\) and consider
the arithmetic mean of a finite collection of probability measures:
or, in the case of a countable family, the weighted series:
Prove that \( P_k \ll P_0\)
If \( \mu \) is a signed or complex measure, it is said that \( \mu\) is absolutely continuous with respect to \( \nu \) if its variation \( |\mu |\) satisfies\( |\mu |\ll \nu\) ; equivalently, if every set \( \mathcal{A}\) for which \( \nu (A)=0\) is \( \mu \) -null.
Theorem 15 (The Radon–Nikodym theorem)
If \( \mu\) is absolutely continuous with respect to \( \nu\) , and both measures are \( \sigma\) -finite, then \( \mu\) has a density, or "Radon–Nikodym derivative", with respect to \( \nu\) , which means that there exists a \( \nu\) -measurable function \( \rho\) taking values in \( [0,+\infty )\) , denoted by \( \rho=d\mu /d\nu\) , such that for any \( \nu\) -measurable set \( \mathcal{A}\) we have:
If \( \mu=P\ll \nu\) is a probability and \( \nu\) is a positive \( \sigma\) -finite measure, the Radon–Nikodym derivative \( \rho=\frac{dP}{d\nu}\) is a random variable, called density (or distribution) of \( P\) with respect to the measure \( \nu\) . We have
In specific cases, a probability measure \( \mu\) can behave as a dominating measure for another measure \( \nu\) if \( \nu \ll\mu.\)
The family of probability densities of (parametric) probability measures is denoted:
We consider the case when \( \mathsf{S}\) is a smooth topological manifold.
Probability distributions in a statistical manifold of exponential type are such that there exist parameterizations \( \theta\) satisfying:
where parameters \( \theta\) and random variables \( X=(X^i)_{i=1}^n\) are chosen adequately, \( \psi(\theta)\) is a potential function.
In information geometry, an important question is whether a given family of probability distributions can be recognized as an exponential family–or, more generally, if it exhibits an exponential structure. Recall that a probability distribution belongs to an exponential family if its density function can be written in the form:
where:
This representation is not merely a convenient rewriting; it endows the parameter space with a rich geometric structure. In particular, when a family of distributions is exponential, the Fisher information metric \( g_{ij}=\mathbb{E}[ \partial_i \ln \rho \; \partial_j \ln \rho]\) and the dual affine connections (which emerge naturally in this setting) yield a dually flat space. This flatness simplifies the study of geodesics, divergence functions, and many aspects of statistical inference.
A classical example is the normal (Gaussian) family. While the normal distribution indeed constitutes an exponential family, its precise exponential form depends critically on the choice of parametrization.
Exercise 34
The Normal/Gaussian distributions on (\( \mathbb{R}, \mathbf{B},\lambda)\) , where \( \mathbf{B}\) is the Borel algebra on \( \mathbb{R}\) and \( \lambda[(a,b)]=\vert b-a\vert\) the Lebesgue measure is defined by the probability distribution with respect to the Lebesgue measure on \( \mathbb{R}\) by:
Is the normal family exponential? If yes, give a proof.
The specific choice of parametrization affects not only the expression of the density but also the form of the Fisher information metric and the associated affine connections. Thus, determining whether a family of probability distributions is of exponential type–and selecting an appropriate parametrization in cases like the normal family–is fundamental in information geometry. It allows one to exploit powerful geometric techniques to derive insights into the behavior of statistical models, facilitating tasks such as parameter estimation, hypothesis testing, and the study of statistical divergences.
Let \( u\in L^{2}(\Omega,\mathbf{S}, P_{\theta})\) be a tangent vector to the manifold \( \mathsf{S}\) (of dimension \( n\) ) of probability distributions (we assume of exponential type) at the point \( P_{\theta}\) .
where \( \mathbb{E}_{P_{\theta}}[u]\) is the expectation value, with respect to the probability \( {P_{\theta}}\) .
The tangent space to the considered manifold \( \mathsf{S}\) is (locally) isomorphic to the \( n\) -dimensional linear space generated by the family of (centered) random variables (called score vector), \( ( \partial_{i}\ell_\theta),\{i=1,\dots,n\}\) where \( \ell_{\theta} = \ln \rho_{\theta}.\)
Remark 10
Densities are positive random variables almost everywhere; the tangent vectors are signed measures (it is a real-valued measure) with vanishing mean value.
Exercise 35
Show that \( \mathbb{E}_{P_{\theta}}[u]=0\) .
Exercise 36
Show that the tangent space \( \mathcal{T}_{\mu,\sigma} \) to the manifold of Gaussian distribution (34) is spanned, in the \( (\mu, \sigma)\) -parametrization, by the two random variables
Deduce that the plane \( \mathcal{T}_{\mu,\sigma}\) consists of all the quadratic polynomials in \( x\) whose expectation vanishes:
In the basis \( \partial_i\ell_{\theta}\) the (Fisher–Rao) metric is the covariance matrix of the score vector:
with
where \( \{a^{i}\}\) form a dual basis to \( \{\partial_j\ell_{\theta}\}\) :
and
Suppose that we perturb infinitesimally \( \theta \) such that \( \theta'=\theta+d\theta\) . Consider the linear map
(this map depends on \( d\theta\) and reduces to the identity map as \( d\theta\) tends to 0).
The given vector
is mapped to \( (u^k+d\theta^i\Gamma_{ij}^ku^j)\partial_k\in\mathcal{T}_{\theta}\) .
This construction allows to establish a correspondence between the vectors in \( T_{\theta}\) and \( T_{\theta'}\) .
The function \( \Gamma_{ij}^k(\theta)\) are the coefficients of the affine connection.
Let \( \pi:E\to \mathsf{S}\) be a vector bundle. A covariant derivative (also knows as a connection) is an \( \mathbb{R}\) -bilinear map
Consider the intrinsic change in the \( j\) -th basis vector \( \partial_j (\theta)\) as \( \theta\) deforms into \( \theta'\) , in the direction of \( \partial_i\) . We obtain the following vector field:
This is a covariant derivative of the vector field \( \partial_j\) along \( \partial_i\) . It is determined from the coefficients of the affine connection (namely \( \Gamma_{ij}^k(\theta)\) ).
There exists a skewness tensor. It is a fully symmetric covariant tensor of rank 3:
given by
so that we have:
The skewness tensor was introduced to formalize the notion of statistical curvature, via the affine connections \( \{\nabla^{\alpha}\}_{\alpha\in \mathbb{R}}\) . We have the following:
for any couple of vector fields \( X, Y\) over \( \mathsf{S}\) ; \( \nabla^{0}\) is the Levi–Civita connection and \( (\cdot)\) is the “contraction” (of two tensors).
The coefficients of the affine connection are:
This called the \( \alpha\) -connection.
Recall that the third order Amari–Chentsov tensor is defined to be:
We use this tensor to simplify calculations of the \( \alpha\) -connection i.e.:
The \( \alpha\) -connection has a meaning of its own, depending on \( \alpha\) . It plays, for example, an important role in statistical inference.
Exercise 30. It is easy to show it using basic set theory.
Exercise 31.
1. Second-Countability of \( X \) :
By definition, \( X \) is second-countable, meaning it has a countable basis \( \mathcal{B} = \{B_n\}_{n \in \mathbb{N}} \) for its topology. That is, every open set in \( X \) can be written as a union of elements of \( \mathcal{B} \) .
2. Open Cover \( \mathcal{U} \) :
Let \( \mathcal{U} = \{U_\alpha\}_{\alpha \in I} \) be an open cover of \( X \) . This means:
3. Constructing a Countable Subcover:
For each \( x \in X \) , there exists some \( U_\alpha \in \mathcal{U} \) such that \( x \in U_\alpha \) .
Since \( \mathcal{B} \) is a basis, there exists some \( B_{n_x} \in \mathcal{B} \) such that:
Let \( \mathcal{B}' = \{B_{n_x} \mid x \in X\} \) . This is a subset of \( \mathcal{B} \) , and since \( \mathcal{B} \) is countable, \( \mathcal{B}' \) is also countable.
4. Extracting a Countable Subcover:
For each \( B_{n_x} \in \mathcal{B}' \) , choose one \( U_\alpha \) such that \( B_{n_x} \subseteq U_\alpha \) . Let \( \mathcal{U}' \) be the collection of all such \( U_\alpha \) .
Since \( \mathcal{B}' \) is countable, \( \mathcal{U}' \) is also countable.
5. Verification that \( \mathcal{U}' \) is a Cover:
For any \( x \in X \) , there exists \( B_{n_x} \in \mathcal{B}' \) such that \( x \in B_{n_x} \subseteq U_\alpha \) for some \( U_\alpha \in \mathcal{U}' \) .
Thus, \( x \in U_\alpha \) , and since \( x \) was arbitrary, \( \mathcal{U}' \) covers \( X \) .
6. Conclusion:
We have constructed a countable subcover \( \mathcal{U}' \) of \( \mathcal{U} \) , proving the theorem.
Exercise 32.
If \( \inf_{x\in A,y\in B} d(x,y)=\delta>0\) then the function \( f\) can be chosen to be uniformly continuous.
Let \( f(x)=d(x,A)/[d(x,A)+d(x,B)]\) . It is easy to verify that \( f\) satisfies both conditions 1 and 2. The other part of the theorem follows from the fact that \( d(a,X)+d(x,B)\geq \delta\) and from the following fact that the function \( d(x,A)\) satisfies the inequality
In particularly, \( d(x,A)\) is uniformally continuous.
Exercise. 33. The proof is omitted since it is considered in Ex. 7.2.
Exercise. 35. The proof is omitted since it is a straightforward calculation.
Exercise 34 and Ex. 36. By an appropriate change of variables, for the Gaussian, this density can be rewritten in the exponential family form, revealing its natural parameters and sufficient statistics.
In this chapter, we embark on a journey into probability and statistics in a manner that may initially surprise the reader: through the lens of philosophy. Rather than taking the conventional route of axiomatic probability theory, we begin with a deeper reflection on the nature of mathematical structures and their underlying symmetries. Our point of departure is a generalization of Felix Klein’s revolutionary insight–his Erlangen Program–which reinterpreted geometry as the study of figures and their congruences under transformation groups.
We invite the reader to consider an analogous perspective in the realm of probability distributions. Just as Klein’s figures reside in a structured mathematical world shaped by transformations, we view probability distributions as abstract entities within a broader conceptual space. This space, much like Plato’s world of ideas, consists of idealized probabilistic objects related by structural equivalences. These equivalences, or congruences, form the foundation upon which we build a categorical framework for probability theory.
This shift in perspective naturally leads us to the introduction of categories of probability distributions, where morphisms arise from Markov kernels–stochastic mappings that embody the very notion of congruence in this setting. In this way, Markov kernels serve as the probabilistic analogues of Klein’s geometric transformations, revealing deeper structures in statistical theory that might otherwise remain hidden.
Through this approach, we illuminate the rich interplay between probability, statistics, and abstract mathematical structures, setting the stage for a unifying perspective that extends beyond classical interpretations.
Let us begin by considering a broad and fundamental question: what is, in general, the aim of geometry?
One can state succinctly that geometry is primarily concerned with:
The foundations of geometry, as formulated by Klein, are based on the study of geometric figures and the transformations that act upon them. This perspective, known as Erlangen’s program, emphasizes the structural relationships between figures rather than their specific realizations. In its classical setting, this formalism provides a natural framework for understanding symmetries, invariants, and classification problems within geometry.
For us, however, we will think of those (Klein) figures as figures in the sense of Plato. Plato’s world of geometric figures refers to his philosophical concept of the World of Forms, where perfect and unchanging geometric shapes exist, independently of the physical world.
In Plato’s philosophy, as described in dialogues such as the Republic and the Phaedo, mathematical objects: circles, triangles, and other ideal forms do not exist merely as physical approximations but as pure, abstract entities in a higher, non-material realm.
It is precisely this concept of Plato’s world of figures/forms which is used in Klein’s perception and which we will use also to describe geometric structures in probability and measure theory.
In the present work, we extend this viewpoint beyond its original domain by considering a formalism in which Klein’s approach is adapted to the realm of probability theory and statistics. The motivation for such a generalization arises naturally: if one seeks to construct a statistical geometry, it is necessary to first identify the appropriate notion of geometric figures and the corresponding transformations that preserve the relevant structures.
Our approach follows a twofold strategy. First, we identify probabilistic objects that play the role of geometric figures in this generalized setting. These may include, for instance, families of probability distributions, families of distributions, or spaces of statistical models. Second, we introduce transformations between these objects that preserve key statistical or probabilistic properties.
The fundamental problem then becomes:
how does one define and classify these transformations, and what structures do they induce?
From this perspective, Markov categories and other categories called \( CAP\) , \( CAPH\) , \( FAMH\) emerge as natural generalizations of geometric figures in Klein’s sense, and their transforms. These transformations–given by statistical morphisms, stochastic maps, or Markov kernels–serve as the analogues of geometric transformations in classical geometry. This interpretation provides a unifying language in which statistical inference, decision theory, and information geometry can be understood in terms of an overarching geometric framework.
The goal of this work is to explore this formalism systematically and to establish the necessary mathematical structures that allow for a coherent treatment of statistical geometry. In particular, we shall investigate the role of categorical constructions, invariants under transformations, and the emergence of new geometric notions adapted to probabilistic contexts. Ultimately, our approach seeks to trivialize Klein’s original vision in such a way that its extension to statistics appears as a natural and inevitable development.
The Klein figure formalism admits a natural extension to probabilistic contexts, particularly in the study of parametrized families of probability distributions. This adaptation establishes a coherent theoretical framework that facilitates systematic analysis and operational flexibility within such probabilistic structures.
The most elementary class of geometric objects consists of subsets of a given space, referred to by Felix Klein as figures.
Since any figure can be regarded as a set of points, one is naturally led to a generalization of this concept of geometry–extending beyond the study of figures in the classical sense. Alongside figures in Klein’s sense, which are simply sets, one may consider parametrized sets, thereby introducing a new geometric framework to examine their intrinsic properties.
Let \( \Theta\) be an index set parametrizing such a collection, and let us construct an epimorphism from this parameter set onto the space under consideration. That is, for each parameter value \( \theta \in \Theta\) , a corresponding point of the considered figure is assigned. In general, a single point of the figure may be associated with multiple parameter values, reflecting a richer underlying structure.
Following Klein’s approach to geometry, we may consider parametrized sets as Klein figures, where the parametrization is defined in a specific way. Such examples include:
Furthermore, in the scope of generalizing objects and structures, we can recall what a category is, as we will need it later.
A category is a mathematical structure that captures relationships between objects in an abstract way. At its core, a category consists of objects and morphisms (arrows) between them, which satisfy rules of composition and identity. One should think of a category not just as a formalism but a lens through which mathematics reveals its hidden symmetries and harmonies.
To be more precise, A category \( \mathscr{C}\) consists of the following data:
These morphisms must satisfy the following axioms:
(1) Composition
For any three objects \( X,Y,Z\) there exists a composition map \( \circ\) which assigns to each pair of morphisms \( f:X\to Y\) and \( g:Y\to Z\) a composite morphism
(2) Identity Morphisms
For each object \( X\) , there exists an identity morphism \( id\) such that for all \( f:X\to Y\) and \( g:Y\to X\) we have
(3) Associativity
For any morphisms \( f:X\to Y\) , \( g:Y\to Z\) , and \( h:Z\to W\) one requires:
One extends Klein’s notion of figures, beyond these classical cases, namely in the framework of probabilities. The following examples provide examples of generalized Klein figures.
(Indeed, any finite or countable family \( \{P_k\}\) of probability measures on a finite \( \sigma\) -algebra is necessarily dominated, as it admits a dominating measure.)
Using the Klein figure formalism, an essential ingredient in constructing a geometric framework for probabilistic or statistical quantities is the study of transformations acting on generalized Klein figures. We now introduce such transformations.
Definition 92
Let \( (\Omega, \mathbf{S})\) and \( (\Omega', \mathbf{\Sigma})\) be two measurable spaces. A Markov transition probability, or equivalently, a Markov morphism, is a real-valued function
defined for \( \omega \in \Omega\) and \( \mathcal{A} \in \mathbf{\Sigma}\) , satisfying the condition that for each fixed \( \mathcal{A}\) , the function
is \( \mathbf{S}\) -measurable.
This transition probability distribution describes a Markovian random transition from the measurable space \( (\Omega, \mathbf{S})\) to \( (\Omega', \mathbf{\Sigma})\) . The following theorem establishes an extension property for such transformations.
Let us consider now, classical example of a statistical problems.
We illustrate the congruences, mentioned in the first section of this chapter on a concrete example, which is presented below.
Example 20
We are interested in estimating the parameters of a normal distribution based on a set of \( N\) independent observations \( \{x^{(1)},...,x^{(N)}\}\) of the parameters of a normal distribution with a density given by:
Here, \( \mu\) represents the mean, and \( \sigma^2\) denotes the variance of the distribution.
From a geometric perspective, the set of all possible normal distributions (with densities of the form given above) forms an infinite-dimensional manifold, as it includes all probability distributions on the real line that are absolutely continuous with respect to the Lebesgue measure. However, when we restrict our attention to the family of normal distributions parameterized by \( (\mu,\sigma)\) , we obtain a two-dimensional surface within this infinite-dimensional space. This surface consists of all normal distributions defined on the Borel \( \sigma-\) algebra of the real line.
The following theorem provides a condition under which two families of probability distributions are considered congruent, meaning that they share an underlying structural equivalence in terms of how their probability measures are defined. This gives rise to a certain notion of universal property, since it is reminiscent of certain universal properties in algebraic geometry.
Theorem 16 (Universal property)
Let us consider two families of probability distributions, denoted as: \( w_1\) and \( w_2\) which are defined as
where these families are probability distributions indexed by a parameter \( \theta\) , meaning that for all \( \theta\in \Theta\) we have a corresponding probability measure on the spaces \( (\Omega^{1},\bf{ S}^{(1)})\) and \( (\Omega^{2},\bf{ S}^{(2)})\) , respectively.
The two families \( w_1\) and \( w_2\) are said to be congruent if there exists:
A measurable mapping: \( \epsilon=\varphi_{i}(\omega)\) of the measure spaces \( (\Omega^{i},\bf{ S}^{(i)})\) , on which they are defined, onto a finite space
A common family of probability distributions:
on the space \( \mathscr{E}\) , which serves as an intermediate representation of the distributions in both families.
A family of transition distributions:
that describes how the probability measure on \( \mathscr{E}\) is "transferred" back to the original space \( \Omega^{(i)}\) . This relation is formalized as:
where the key consistency condition is such that
This ensures that each outcome \( \omega^{(i)}_{j}\) belongs to the preimage of its corresponding \( \epsilon\) , meaning that correctly classifies elements into equivalence classes in \( \mathscr{E}\) .
Example 21
Let us apply this theorem on congruences between families of distributions to a congruence between certain simplices, that contain them. The original families of probability distributions can be described within each simplex using the same barycentric coordinates. This allows us to focus our study on the structure of equivalent families, given by simplices in a special position. Such families will be called maximal.
More precisely, consider two simplices:
For each face of the original simplex \( Cap\) , determined by a subset \( E_i\) of its vertices, we select a point with barycentric coordinates \( P_{E_i}^{(1)}[\omega]\) , for each \( \omega \in E_i\) and for each \( i\in \{1,…,m\}\) . We assume that all but one of these points are chosen in the interior of their respective faces.
The collection of these points \( P_{E_i} [.]\) are the vertices of \( m\) -dimensional forms an \( m\) -dimensional subsimplex. This subsimplex is equivalent to a similarly constructed \( m\) -dimensional subsimplex in the second simplex. Through this construction, we establish a structural equivalence between the two families in terms of their representation within simplicial geometry.
Theorem 17
Let \( \Pi(\omega, \mathcal{A})\) be a transition probability distribution from the measure space \( (\Omega, \mathbf{S})\) to \( (\Omega', \mathbf{\Sigma})\) . Then, by extending the probability distribution
to a probability distribution
we obtain a transition probability distribution from \( (\Omega, \mathbf{S^*})\) to \( (\Omega', \mathbf{\Sigma^*})\) .
Consequently, for any probability distribution \( P\) , we have:
Remark 11
We make a link between the statement outlined above and the universal property of Theorem 16. Theorem 16 says that there a bijective correspondence is established by the operators \( \Pi_{ij}\) , which are given by
where \( \Pi^{(i)}\) is given by the mapping \( \varphi_i(\omega)\) .
Exercise 37
The proof is left as an exercise. However, since it is a very ambitious exercise, we suggest that the reader looks up the definition of an inner and outer measure.
The following theorem shows the existence of a natural subcategory, which we denote by \( CAP^*\) , within the category \( CAP\) of measurable spaces and Markov morphisms.
Theorem 18 (Subcategory of Markov Morphisms)
Let \( (\Omega, \mathbf{S})\) be a measurable space whose \( \sigma\) -algebra \( \mathbf{S}\) is closed, and let \( CAP\) denote the category of measurable spaces with Markov morphisms. Then the subclass of such spaces, together with all their Markov morphisms, forms a subcategory \( CAP^*\) of \( CAP\) . Moreover, the natural extension of each probability measure from \( \mathbf{S}\) to its closure \( \mathbf{S}^*\) defines a functor mapping \( CAP\) into \( CAP^*\) .
Proof
By definition, any collection of objects along with all the corresponding morphisms (when these morphisms are suitably restricted) forms a subcategory. The key point is that the operations of extending a probability measure and of applying a Markov morphism commute. Indeed, suppose that for a Markov morphism \( \Pi_{01}\) we have a corresponding probability measure \( P_\omega\{.\} = \Pi_{01}(\omega; \cdot)\) , and let \( \Pi_{12}\) be another Markov morphism. Then, for every \( \omega \in \Omega\) , one may show that
in the sense that the extension of the composite is equal to the composite of the extensions. In particular,
This commutativity property ensures that the extension procedure defines a functor from \( CAP\) to \( CAP^*\) .
Remark 12
The passage to the closure of the algebra in the category of measurable spaces is functorial.
We now show that Markov morphisms form a category in a natural way.
Lemma 2
Let \( \Pi(\omega';\omega'')\) be a Markov morphism from \( (\Omega', \mathbf{S}')\) to \( (\Omega'', \mathbf{S}'')\) , and let \( f(\omega'')\) be a measurable real function on \( (\Omega'', \mathbf{S}'')\) with
Then, the function
is a measurable function on \( (\Omega', \mathbf{S}')\) and satisfies
Proof
The proof follows from the monotonicity and linearity of the integral, combined with the fact that \( \Pi(\omega'; \cdot)\) is a probability measure for each fixed \( \omega'\) .
We now state a fundamental theorem concerning dominated families (denoted DomFam) in the context of Markov morphisms.
Theorem 19
Relative to the category of Markov morphisms, the property of being dominated is absolutely invariant; that is, if a measure \( \mu\) on \( (\Omega, \mathbf{S})\) dominates another measure \( \nu\) (denoted \( \mu \gg \nu\) ), then for any Markov morphism \( \Pi\) from \( (\Omega,\mathbf{S})\) to \( (\Omega',\mathbf{\Sigma})\) the induced measure satisfies
Proof
Let \( \mu\) be a nonnegative measure on \( (\Omega, \mathbf{S})\) such that \( \mu \gg \nu\) , meaning that for any set \( A \in \mathbf{S}\) , if \( \mu(A)=0\) , then \( \nu(A)=0\) . Let \( \Pi(\omega; d\omega')\) be a transition distribution determining a Markov morphism from \( (\Omega,\mathbf{S})\) to \( (\Omega',\mathbf{\Sigma})\) .
To show that \( \mu\Pi \gg \nu\Pi\) , it suffices to prove that for any \( \mathcal{A} \in \mathbf{\Sigma}\) with \( (\mu \Pi)(\mathcal{A}) = 0\) , we have also \( (\nu \Pi)(\mathcal{A}) = 0\) . For a fixed \( \mathcal{A}\) , define the set
It is known that if \( f: \Omega \to [0,1]\) is a measurable function with \( \int_\Omega f(\omega) \mu(d\omega)=0\) , then \( \mu\{ \omega \mid f(\omega) > 0 \} = 0\) . Hence, taking \( f(\omega)=\Pi(\omega; \mathcal{A})\) , we deduce that
Since \( \mu \gg \nu\) , it follows that \( \nu(B)=0\) . Furthermore, for \( \omega \in \Omega \setminus B\) , we have \( \Pi(\omega; \mathcal{A}) = 0\) . Thus,
This completes the proof of the invariance of the domination relation.
Theorem 20
Let \( CAP\) denote the category of measurable spaces with Markov morphisms. Then the subclass of dominated measurable spaces \( (\Omega, \mathbf{S})\) whose \( \sigma\) -algebras are closed, together with all their Markov morphisms, forms a complete subcategory \( \mathsf{FAMD}\) of \( CAP\) . Moreover, the collection of all families of mutually absolutely continuous probability distributions forms a complete subcategory \( \mathsf{FAMH}\) of \( \mathsf{FAMD}\) .
Proof
We organize the proof in two parts.
(i) Dominated Families Form a Subcategory. Suppose \( Q\) is a probability measure on \( (\Omega,\mathbf{S})\) that dominates a family \( \{P_\theta\}_{\theta \in \Theta}\) , i.e., for each \( \theta\) , we have \( Q \gg P_\theta\) . By the previous theorem, for any Markov morphism \( \Pi\) from \( (\Omega,\mathbf{S})\) to \( (\Omega',\mathbf{\Sigma})\) , the extended measure \( Q\Pi\) dominates the family \( \{P_\theta\Pi\}_{\theta \in \Theta}\) . Thus, the property of domination is preserved under Markov morphisms, and the dominated families with all their Markov morphisms form a subcategory of \( FAM\) , which we denote by \( \mathsf{FAMD}\) .
(ii) Mutual Absolute Continuity. In a family of mutually absolutely continuous probability measures, each measure dominates every other. Since this domination relation is invariant under any Markov morphism, the subcategory \( \mathsf{FAMH}\) of such families is complete. Hence, we conclude that \( \mathsf{FAMD}\) is a complete subcategory of \( FAM\) , and within it, \( \mathsf{FAMH}\) forms a complete subcategory.
Remark 13
The process of passing to the closure of the \( \sigma\) -algebra in the category of measurable spaces is functorial.
We end this subsection with the following lemma.
Lemma 3
If \( P(\cdot)\in CAP(\Omega,\bf{ S})\) is a constructive probability distribution (i.e. such that the distribution can be constructed from the family \( Q_\theta\) then all distributions that it dominates are also constructible.
In summary, our construction shows that the property of being dominated is invariant under Markov morphisms, and that dominated families and the families of mutually absolutely continuous distributions naturally form complete subcategories of the category of measurable spaces with Markov morphisms.
Lemma 4
The measure
is a dominating law.
Exercise 38
Show that for each \( P_k\) we have the bound:
Definition 93
A Klein geometry generated by an elementary category of topological spaces is said to be almost homogeneous or quasi-homogeneous if any two points \( a \in A\) and \( b \in B\) of any two objects \( A\) and \( B\) are totally arbitrarily approximable.
Lemma 5
Consider an almost-homogeneous Klein geometry. Every invariant function in the category of topological spaces over a scalar field is identically constant.
Proof
The proof is evident.
Definition 94
Let \( \gamma(t)\) be a curve. We say that \( \gamma(t)\) is geodesic if the family of tangent vectors
is parallel along the curve, that is, for all \( t\) :
Remark 14
Geodesics are the analogs of straight lines in Euclidean spaces.
In local coordinates, the geodesic equation takes the form:
Since parametrization is essential, any other canonical parametrization \( s = s(t)\) of \( \gamma(t)\) must be an affine function of \( t\) , that is, \( s(t) = at + b\) .
The linear connection operator \( \nabla\) is associated with two multi-linear operators \( T(X,Y)\) and \( R(X,Y)\) :
The torsion operator:
The curvature operator:
where \( [X,Y] = XY - YX\) is the commutator.
Remark 15
If both the torsion and curvature vanish identically, then:
Definition 95
A submanifold \( N\) of a manifold \( M\) equipped with a linear connection \( \nabla\) is called totally geodesic if, for any two points of \( M\) , it contains the whole geodesic passing through these points.
Lemma 6
A submanifold \( N \subset M\) is totally geodesic if, for any tangent vector fields \( X, Y\) on \( N\) , the vector field \( Z = \nabla_X Y\) is also tangent to \( N\) .
Proof
The proof is evident.
Remark 16
A submanifold \( N \subset M\) is totally geodesic if every locally shortest curve in \( N\) is also locally shortest in \( M\) .
Let us return to the framework of Klein geometry and consider a category of topological spaces, where morphisms reflect the fundamental structures of geometric transformations.
Definition 96
Let \( \mathcal{C}\) be a category of topological spaces, and let \( \nabla\) be a linear connection on each object of \( \mathcal{C}\) . The connection \( \nabla\) is said to be absolutely equivariant if, for any morphism \( \varphi: X \to Y\) in \( \mathcal{C}\) , the image of every geodesic in \( X\) under \( \varphi\) is a geodesic in \( Y\) .
This definition encapsulates the idea that a natural connection should be preserved under morphisms of the category, ensuring that geodesic structures remain invariant under transformations.
To better understand the implications of an absolutely equivariant connection, let us recall the fundamental properties of geodesics in canonical coordinates.
Thus, the notion of geodesic symmetry emerges naturally in the categorical framework: an equivariant connection ensures that midpoint operations behave consistently across all objects and morphisms of the category.
In the previous chapter, we introduced key ideas in probability and statistics and briefly alluded to a more structural perspective using category theory and differential geometry. However, these tools have not yet been systematically employed. In this chapter, we take a decisive step in that direction by developing a structured approach relating to category theory and differential geometry in the study of statistical manifolds.
The categorical viewpoint allows us to formalize probabilistic transformations, treating probability distributions as objects and Markov kernels as morphisms. At the same time, differential geometry provides powerful tools for analyzing the geometric structure of statistical manifolds, particularly through notions such as connections, curvature, and geodesics. By combining these perspectives, we gain deeper insight into the intrinsic structure of statistical models and their transformations.
This chapter is devoted to making these ideas explicit. We will establish the categorical framework for statistical manifolds and then show how geometric methods naturally arise within this setting. The interplay between these two formalisms will not only clarify fundamental structures but also pave the way for further developments in information geometry, learning theory, and beyond.
In order to rigorously define the notion of statistical manifolds, we introduce five fundamental collections of probability distributions, which serve as building blocks for our categorical formulation:
These collections are formally defined as follows:
Definition 97
Let \( (\Omega, \mathbf{S})\) be a measurable space. We define:
The following result formalizes the correspondence between these spaces:
Proposition 4
The collections \( \mathsf{Var}(\Omega, \mathbf{S})\) and \( \mathsf{Cap}(\Omega, \mathbf{S})\) are in one-to-one correspondence.
Proof
The claim follows from the structural equivalence of probability distributions in both settings.
Proposition 5
The collections \( \mathsf{Cap}, \mathsf{Capd}, \mathsf{\mathsf{Caph}}, \mathsf{\mathsf{Conh}}, \mathsf{Var}\) each form a manifold equipped with a corresponding atlas.
Proof
See Chentsov, pp. 73-74.
A statistical problem is naturally associated with the measurable space \( (\Omega, \mathbf{S})\) of sample outcomes, together with:
Definition 98
A statistical decision rule is a transition probability distribution \( \Pi(\omega, d\epsilon)\) describing a Markov random transition from the sample space \( (\Omega, \mathbf{S})\) to the inference space \( (\Omega', \mathbf{S'})\) .
Remark 17
Any transition probability distribution \( \Pi(\omega, d\epsilon)\) may be interpreted as a decision rule within any statistical model whose sample space is \( (\Omega, \mathbf{S})\) and whose inference space is \( (\Omega', \mathbf{S'})\) .
Furthermore, given a Markov random transition from \( \Omega'\) to \( \Omega''\) described by a transition probability \( \Pi(\omega', d\omega'')\) , we obtain a Markov morphism between the measurable spaces \( (\Omega', \mathbf{S'})\) and \( (\Omega'', \mathbf{S''})\) .
Theorem 21
The class of objects:
equipped with the system of Markov homeomorphisms defined by
forms the categories of statistical decisions \( \mathsf{CAP}, \mathsf{CAPD}, \mathsf{\mathsf{Caph}}, \mathsf{\mathsf{Conh}}\) , respectively, which are isomorphic to categories of statistical decision rules.
Proof
The proof follows from the categorical structure imposed by the Markov homeomorphisms and the natural compatibility of the decision rule formulation.
In the theory of statistical manifolds, four types of manifolds are of particular interest:
Consider the category \( \mathsf{Caph}F\) of collections \( \mathsf{\mathsf{Caph}}(\Omega,\mathbf{S},\mathbf{Z})\) with finite quotient algebras \( \mathbf{S}/ \mathbf{Z}\) . These collections are finite-dimensional manifolds. Any \( \mathbf{Z}\) -dominated measure on \( (\Omega, \mathbf{S})\) , where \( \mathbf{S}/ \mathbf{Z}\) is finite, is completely determined by the vector
Definition 99
Let \( \mathsf{\mathsf{Conh}}(\Omega,\mathbf{S},\mathbf{Z})\) be the collection of all nonnegative, mutually absolutely continuous measures on \( (\Omega,\mathbf{S})\) that vanish on \( \mathbf{Z}\) -sets and only there. Then if \( S_m \simeq \mathbf{S}/\mathbf{Z}\) , the correspondence
and
defines a chart of the entire cone \( \mathsf{\mathsf{Conh}}\) . This chart is called a natural chart. The atlas of charts \( {\mathsf{Conh}}\) also includes other charts obtained from the natural chart through analytic or infinitely differential coordinate transformations. However, the natural chart holds a privileged position.
Proposition 6
The collection \( \mathsf{Caph}(\Omega,\mathbf{S},\mathbf{Z})\) is selected from \( \mathsf{Conh}(\Omega,\mathbf{S},\mathbf{Z})\) by the condition
making it a hypersurface. Specifically, \( \mathsf{Caph}(\Omega, \mathbf{S}, \mathbf{Z})\) is the intersection of the cone \( \mathsf{Conh}(\Omega,\mathbf{S},\mathbf{Z})\) where \( \mu_j > 0\) for \( j=1,…,m\) , where \( \mu_j= \mu\{A_j\}\) , with the hyperplane defined by \( <\mu,I>=1\) .
The vectors \( \mathbf{p}=(P\{A_1\},…,P\{A_m\})\) corresponding to \( \bf{ P\{\cdot\}}\) , exhaust the interior of the unit simplex exhaust the interior of the unit simplex
where \( p_j\geq 0\) , \( \sum_{j=1}^m p_j=1\) , and \( \mathbf{e}_1=(1,0,…,0),\mathbf{e}_2=(0,1,0,…,0),…, \mathbf{e}_m = (0,0,0,…,0,1)\) are the vertices of the simplex.
Definition 100
The probabilities \( P\{\mathbf{A}_j\}=p_j\{P\}\) simultaneously serve as:
Natural coordinates: The bijection
identifies points in the ambient space.
Remark 18
Let \( \mathsf{Caph}\) be a manifold and consider a surface within it. The natural coordinates \( \mathbf{p}=(p_1,…,p_m)\) are not local coordinates of manifold \( \mathsf{Caph}\) .
Proof
The coordinates \( p_1,…,p_m\) are linearly dependent on the surface since \( \sum_{j=1}^mp_j\) . This reduces the intrinsic dimension to \( m-1\) , rendering one coordinate redundant.
To define a local coordinate system for \( \mathsf{Caph}\) , we discard one component of p.
For example, omitting \( p_m\) , we obtain the chart :
However, such charts lack invariance under permutations of the atoms \( A_1,…, A_m\) . To resolve this:
Introduce equivalence classes under scaling:
where p is defined up to a positive multiplicative constant. This ensures permutation invariance.
Another type of coordinates, which can be used to solve the problem of chart invariance, is to introduce the rectilinear coordinates. Let \( f_j:\Omega\to \mathbb{R}\) , \( (j=0,…,m-1)\) be \( \bf{ S}\) -measurable functions with \( f_0(\omega)=I(\omega)\) . Assume the matrix \( \bigg(f_j(\omega_i)\bigg)\) has full rank \( m\) . Define coordinates:
where \( \mathbf{M}_Pf(\omega)\) is an expectation of a random variable \( f(\omega)\) , defined as a following integral:
where the function \( f\) must be integrable with respect to the measure \( P\) (or quasi-integrable).
Definition 101
Let \( \mathsf{Caph}\) be an \( (m-1)\) -dimensional manifold. A local coordinate system \( \mathbf{t}= (t_1,…,t_{m-1})\) for \( \mathsf{Caph}\) is called natural if it is induced by the expectations:
and such that
where \( f_j(\omega)\) are linearly independent functionals and each \( f_j\) is \( P\) -quasi-integrable.
The introduction of canonical affine coordinates in statistical manifold theory arises naturally from fundamental geometric and probabilistic considerations. Let us first define the indicator function
These functions are not necessarily linearly independent; however, any \( m-1\) of them, together with the identity function \( I(\omega)\) , form a linearly independent set.
Definition 102
A coordinate system \( \mathbf{s}=(s^1,\dots,s^{m-1})\) on the open simplex \( \mathsf{Caph}(\Omega, \mathbf{S}, \mathbf{Z})\) is called canonical affine if
where \( P_0\) is the origin and \( \mathbf{s}\) is a covariant linear coordinate system of the group of translations.
More explicitly, if \( P_0\{.\}\) represents the probability distribution at the origin and the functions \( g_0(\omega) = 1\) , \( g_1(\omega),\dots,g_{m-1}(\omega)\) define the coordinate axes, then:
Here, \( \mu\{.\}\) is an arbitrary \( \mathbf{Z}\) -positive measure vanishing on \( \mathbf{Z}\) -sets only, and the normalization factor \( \exp[\Psi(\mathbf{s})]\) is given by:
To analyze the structure of statistical manifolds, we introduce tangent vectors and vector fields. Let \( \{P_t\}\) be a family of probability distributions parameterized by a smooth curve \( \mathbf{x}(t)\) . The functional \( (Y)_{P}\) of smooth functions \( f(\mathbf{x})\) is defined as:
which represents a tangent vector at the point \( P=P_{\theta}\) .
Lemma 7
The curve \( P_t\) admits the following decomposition:
where \( \tau = t - \theta\) , \( P'_{\theta}\) is a charge of bounded variation with total measure zero, i.e., \( P'_{\theta} {\Omega} =0\) , and the norm of the remainder term vanishes faster than \( \tau\) .
Proof
Consider the representation \( P_t\{.\} \leftrightarrow \mathbf{p}(t) = (p_1(t),\dots,p_{m-1}(t),p_m(t))\) , where \( p_m(t)= 1 - p_1(t) - \dots - p_{m-1}(t)\) . Near \( t = \theta\) , we expand:
Adding these expansions and subtracting from unity, we obtain the corresponding expression for \( p_m\) . Defining the auxiliary charges:
we obtain:
Using the chain rule,
The result follows from continuity and the existence of fiberings of the simplex into smooth trajectories.
In classical differential geometry, vector fields are typically expanded in a coordinate basis \( X_i = \frac{\partial}{\partial x^i}\) . However, in statistical manifold theory, this approach is often inconvenient due to the transformation properties of probability distributions. Instead, we construct an alternative basis.
We define \( m\) vector fields \( Y_j\) such that:
Additionally, we introduce \( m\) vector fields \( X_j\) satisfying:
To understand the structure of these vector fields, consider the simplex and an arbitrary segment within it. Among all possible segments, we identify those connecting a vertex \( \mathbf{e}_i\) to a point on the opposite face. These segments, known as Ceva lines, define a special class of trajectories in the simplex.
Let \( \mathbf{p}(t; q_2,\dots,q_m)\) be a Ceva line. More precisely, we define it as
Lemma 8
The vector fields \( X_i = \frac{\partial}{\partial x^i}\) , where \( (x^i,\dots,x^n)\) is the field of differentiation along the Ceva line with respect to the parameter \( t\) , satisfy the differential equation
In particular, for \( i=1\) , we obtain
Remark 19
The vector field \( Y_1\) corresponds to the choice of parameter \( p_1(t) =1 - \exp\{ -t \}\) . The remaining fields \( X_j\) and \( Y_j\) are obtained analogously along the Ceva lines:
where \( p_j(t)\) for \( X_j\) satisfies the differential equation
Similarly, for the field \( Y_j\) , we have \( p_j(t)= 1 - \exp\{-t\}p\) .
Proof
We compute \( \dot{p_i}(t)\) , which defines the tangent vector of differentiation with respect to \( t\) . For \( i=1\) , we have
Additionally, we obtain
Since differentiation with respect to \( t\) satisfies \( \frac{d}{dt} \longleftrightarrow p_1(t) \mathbf{e}_1 - p_1(t)\mathbf{p}(t)\) , we conclude the proof of the lemma.
Lemma 9
The system of tangent vectors
is complete at each point of the manifold \( \mathsf{Caph}\) . That is, any tangent vector \( (Z)_{\mathbf{p}}\) may be expanded in terms of the \( (X_j)_{\mathbf{p}}\) , and any \( m-1\) vectors \( (X_j)_{\mathbf{p}}\) form a basis.
Every vector field \( Z\) admits a unique smooth expansion:
subject to the additional condition
All other expansions of the form \( Z= \sum_{j=1}^m \eta^j X_j\) satisfy
where \( \phi(\mathbf{p})\) is a scalar field.
Remark 20
If we replace the condition \( \sum_{i=1}^m \zeta^i p_i = 0\) by
then the above lemma remains valid.
Proof
Let \( (Z)_{\mathbf{p}} \leftrightarrow \mu\{\cdot\}\) , where \( \mu\{\Omega\}=0\) . Then
Since \( (X_j)_{\mathbf{p}} \longleftrightarrow p_j \mathbf{e_j} - p_j \mathbf{p}\) , we obtain:
Thus,
Since \( \sum_{j=1}^m X_j = {\mathbf{0}}\) , it follows that \( X_k = - \sum\limits_{j \neq k} X_j\) . Inserting this expression into \( (Z)_{\mathbf{p}}\) , we obtain an expansion in a basis of \( m-1\) vectors.
Let \( \sum\limits_j \zeta^j X_j = Z = \sum\limits_j \eta^j X_j\) be two expansions of \( Z\) . Then
or equivalently,
Since \( X_2,\dots,X_m\) form a basis, we deduce:
From
we see that \( \sum_{j=1}^m \zeta^j(\mathbf{p}) p_j=0\) if and only if \( \sum_{j=1}^m\zeta^j (\mathbf{p}) p_j = \phi(\mathbf{p})\) . Thus, the lemma is proved.
Let us now examine the interplay between the vector fields \( X_j\) and canonical affine coordinates.
Lemma 10
In a canonical affine coordinate system aligned with the directions \( \epsilon_j(\omega)\) for \( j\neq k\) , the coordinate curves parametrized by
correspond to the Ceva lines of the \( i\) -th family. The tangent vector fields associated with these curves are given by
Proof
Consider the probability measure \( P_0\) at the center of the simplex:
Let \( k=m\) and integrate with respect to \( \mu\) over the atom \( A_i\) in the expression
where \( \mu\) is an arbitrary positive measure vanishing only on \( \mathbb{Z}\) -sets. Noting that \( \epsilon_j(\omega) = 0\) on \( A_i\) for \( j\neq i\) , we obtain
for \( i < m\) , and
where
Thus,
Consider the coordinates \( (x_0, x^1, x^2,\dots, x^{m-1})\) and let only \( x^1\) vary, that is, \( x^1 = x_0^1 +t\) , with \( P_{x(t)}= R_t\) and \( b_i(t) = B_t\{ A_i\}\) , while \( x^2, \dots, x^{m-1}\) remain fixed.
Then we have
where
Summing over \( i=2,\dots,m\) , we obtain
This confirms that the coordinate line is a Ceva line.
It remains to calculate \( b_1(t)\) . Substituting our expression for \( \exp{- \Phi(t)}\) , we obtain
and differentiating with respect to \( t\) ,
This establishes the required result. The remaining cases follow by permutation of atoms and coordinates.
Theorem 22
A tensor field defined on the simplex \( \mathsf{Caph}\) is invariant under the group of translations if and only if its components are constant with respect to the the canonical coordinate system.
Proof
Consider the canonical coordinate system \( (z_1, z_2,\dots,z_{m-1})\) . In these coordinates, the simplex becomes an \( (m-1)\) -dimensional affine space, where translations correspond to parallel shifts.
Since the group of parallel translations is both transitive and simply transitive, we conclude that carrying the components of a tensor unchanged across all points yields an invariant field. Let \( f(\mathbf{p})=c\) be an invariant scalar field. For any point \( \mathbf{q}\) , there exists a unique translation mapping \( \mathbf{p}\) to \( \mathbf{q}\) , denoted by \( T\mathbf{p}=\mathbf{q}\) . Since \( f^T = f\) , we obtain
Hence, \( f(\mathbf{q})\) is necessarily constant.
Now, let
Under translation, canonical coordinates shift as
which implies
If \( Y\) is invariant, then
Since the \( \frac{\partial}{\partial z^j}\) form a basis at each point, it follows that
Hence, the theorem is proved.
Corollary 2
On the manifold \( \mathsf{Caph}\) , there exist translation-invariant Riemannian metrics that convert \( \mathsf{Caph}\) into a Euclidean space.
Definition 103
A connection is said to be flat if
Remark 21
The flat connection defined above is compatible with any translation-invariant Riemannian metric.
Some easy exercises.
Exercise 37, correction.
Proof The proof is divided into three parts.
Firstly, since any measure on \( \mathbf{\Sigma}\) can be extended to a measure on \( \mathbf{\Sigma^*}\) , where it coincides with the induced inner and outer measures, it remains to show that for any fixed \( \mathcal{A} \in \mathbf{\Sigma^*}\) , the transition probability function \( \omega \mapsto \Pi^*(\omega, \mathcal{A})\) is \( \mathbf{S^*}\) -measurable.
Having established the first part of the proof, we now proceed to the second step. Our aim is to show that the real-valued function
is \( \mathbf{S^*}\) -measurable for any fixed \( \mathcal{A} \in \mathbf{\Sigma^*}\) . To do so, it suffices to verify that for any probability measure \( P\) on \( \mathbf{S}\) and any real number \( z \in [0,1]\) , the outer and inner \( P\) -measures of the preimage
of the half-open interval \( [0, z)\) coincide, i.e.,
which ensures that \( U\) is indeed \( \mathbf{S^*}\) -measurable.
Consider probability measures \( P\) and \( Q\) . For any \( \mathcal{A} \in \mathbf{\Sigma^*}\) , there exist two sets \( \mathcal{G}, \mathcal{F} \in \mathbf{\Sigma}\) satisfying:
such that for every \( \omega\) , the following equalities hold:
Since probability measures are monotone, we obtain the inequalities:
Therefore, the functions \( \Pi(\omega; \mathcal{G})\) and \( \Pi(\omega; \mathcal{F})\) are \( \mathbf{S}\) -measurable and equal almost everywhere, since
except possibly on a \( \mathbf{S}\) -measurable \( P\) -null set \( N\) where \( P\{N\} = 0\) .
From this, it follows that:
Moreover, we also have:
Thus, we conclude:
which proves the second part.
Finally, we proceed to the third and last step: proving that the measures \( (P\Pi)^*\) and \( P^*\Pi^*\) , both defined on \( (\Omega',\mathbf{\Sigma^*})\) , coincide. That is, we claim that for any \( \mathcal{A} \in \mathbf{\Sigma^*}\) , we have:
Let us choose sets \( \mathcal{F}\) and \( \mathcal{G}\) satisfying for a given \( \mathcal{A}\) :
Then, as before, we have:
where \( Q = P \Pi\) . Given that the function \( \Pi_{\mathcal{A}}^*(\omega) = \Pi^*(\omega; \mathcal{A})\) is \( \mathbf{\Sigma^*}\) -measurable, we obtain:
By monotonicity, we conclude:
Thus, we deduce:
Since this holds for all \( \mathcal{A} \in \mathbf{\Sigma^*}\) , we obtain the desired equality:
This completes the proof.
Exercise 38.
The measure \( P_0\) is dominating since for each \( P_k\) we have the bound
and the inclusion of null sets:
If we can show that \( P_0\) is a constructive probability measure, the desired conclusion follows from Lemma 3.
Consider a measurable mapping \( \omega = f_k(x)\) , where \( f_k : \mathbf{E} \to \Omega\) determines the constructive distribution
Define \( f_0\) piecewise as follows: for \( 2^{-k} < x < 2^{-k+1}\) , let
and set \( f_0(0) = f_1(0)\) .
This function is measurable, as it is composed of countably many measurable functions, each defined on a distinct measurable subset. Consequently, the \( f_0\) -preimage of any \( \mathbf{S}\) -set is a countable union of disjoint \( g_k\) -preimages.
Furthermore, we have the measure transformation:
since the \( g_k\) -preimage is the \( f_k\) -preimage shifted by \( 2^{-k}\) and contracted by a factor of \( 2^k\) . This completes the proof.

A Frobenius manifold \( \mathcal{M}\) is a geometric realization of a certain class of partial nonlinear differential equations of degree \( 3\) . These equations, commonly referred to as the WDVV equations—named after Witten, Dijkgraaf, Verlinde, and Verlinde—are also known in some mathematical contexts as the Associativity Equations.
There is a subtle distinction between Frobenius manifolds and the WDVV equations. While the notion of Frobenius manifolds represents a geometric object, the WDVV equations belong to the realm of analysis. The terminology “Frobenius manifolds” and “Associativity equations” appears predominantly in mathematical literature, whereas “WDVV equations” is more frequently encountered in discussions closer to physics.
These perspectives, however, are deeply interwoven. As shown by Yu. I. Manin [17] and B. Dubrovin [16], they ultimately coincide.
In the context of physics, solutions of the WDVV equations encode the moduli spaces of topological conformal field theories. These solutions are pivotal in the formulation of mirror symmetry for Calabi–Yau \( 3\) -folds. Notably, specific solutions to the WDVV equations with particular properties serve as generating functions for the Gromov–Witten invariants of Kähler or symplectic manifolds.
In the 1990s, the concept of Frobenius manifolds gained traction among both algebraic and analytic geometers due to the discovery of three significant classes, all stemming from the profound interplay between mathematics and physics. This development led to the creation of advanced mathematical tools and generated an abundance of challenging and intriguing problems. Some of these arose naturally from attempts to provide a rigorous mathematical interpretation of mirror symmetry (e.g., Homological Mirror Symmetry). Others emerged from the ambition to axiomatize and deepen the mathematical understanding of topological quantum field theory.
The first class of Frobenius manifolds originated in the study of singularities. Specifically, Saito spaces—predating the explicit definition of Frobenius manifolds—were associated with the unfolding spaces of singularities. A particularly straightforward example is the space of complex polynomials of a fixed degree \( d\) with distinct roots, which can also be identified with the configuration space of \( d\) marked points on the complex plane.
Subsequent classes discovered in the 1990s have a formal nature. Examples include the (formal) moduli spaces of solutions to Maurer–Cartan equations modulo gauge equivalence and the formal completions of cohomology spaces of smooth projective (or compact symplectic) manifolds. The latter is commonly referred to as quantum cohomology.
More recently, a new class of Frobenius manifolds has emerged, demonstrating the versatility of this structure. It has been shown that under certain conditions, the manifold of probability distributions can generate a Frobenius manifold (this is the Combe–Manin construction). This discovery inspires many open questions, particularly regarding the complete classification of Frobenius manifolds, the identification of novel classes, and the elucidation of the intricate relationships between the existing ones.
In the first section of this chapter, we provide a concise survey of Frobenius manifolds. We introduce the definitions of Frobenius, pre-Frobenius, and potential pre-Frobenius manifolds and briefly outline the WDVV equations. We also touch upon the hidden algebraic structures intrinsic to Frobenius manifolds. This section concludes with a brief summary.
In the second section, we delve more deeply into the theoretical framework of Frobenius manifolds.
In the third section, we present a collection of exercises and open problems for the reader to explore.
An essential ingredient for the definition of Frobenius manifolds is the concept of affine flat structures. For simplicity, we shall often refer to these as affine structures.
Let \( \mathcal{M}\) be a smooth manifold of dimension \( \mathcal{M}\) . Affine flat structures can be defined in multiple equivalent ways.
An affine structure on an \( n\) -dimensional manifold \( \mathcal{M}\) is defined via a collection of coordinate charts \( \{(U_\alpha, \phi_\alpha)\}\) , where \( \{U_\alpha\}\) forms an open cover of \( \mathcal{M}\) , and \( \phi_\alpha: U_\alpha \to \mathbb{R}^n\) is a local coordinate system such that the transition functions \( \phi_\beta \circ \phi_\alpha^{-1}\) are affine transformations on \( \phi_\alpha(U_\alpha \cap U_\beta)\) , mapping it to \( \phi_\beta(U_\alpha \cap U_\beta)\) .
Recall that the group of affine transformations is given by:
Definition 104
An affine manifold is a smooth manifold equipped with an affine structure.
The existence of an affine flat structure on \( \mathcal{M}\) is equivalent to the presence of a specific class of connections on the tangent bundle of \( \mathcal{M}\) . Namely, there is a bijective correspondence between affine flat structures and flat, torsion-free affine connections \( \nabla\) on \( \mathcal{M}\) .
In the context of differential geometry, a manifold over \( \mathbb{K}\) with an affine structure is characterized by a tangent bundle whose underlying \( G\) -structure corresponds to the group of affine transformations \( Aff(n) = GL(n, \mathbb{K}) \rtimes \mathbb{K}^n\) , where \( GL(n, \mathbb{K})\) is the general linear group over the field \( \mathbb{K}\) .
Alternatively, an affine flat structure can be described by a subsheaf \( \mathcal{T}_\mathcal{M}^f \subset \mathcal{T}_\mathcal{M}\) of linear spaces of pairwise commuting vector fields. Locally, one has the tensor product over the ground field:
Sections of \( \mathcal{T}_\mathcal{M}^f\) correspond to flat vector fields. Furthermore, the metric \( g\) is compatible with the structure \( \mathcal{T}_\mathcal{M}^f\) if \( g(X, Y)\) is constant for all flat vector fields \( X\) and \( Y\) (Exercises \( 12.1\) and \( 12.2\) ).
A group \( \Lambda\) is called an \( n\) -crystallographic group if it contains a normal, torsion-free, maximal abelian subgroup of rank \( n\) and finite index. Crystallographic groups fit into the short exact sequence:
where \( V\) is a complex vector space, and \( P \leq GL(n, \mathbb{Z}) \cong \mathrm{Aut}(V)\) is a finite group acting faithfully on \( V\) .
A complex crystallographic group is a discrete subgroup \( \Lambda \subset \mathrm{Iso}(\mathbb{C}^n)\) such that \( \mathbb{C}^n / \Lambda\) is compact, where \( \mathrm{Iso}(\mathbb{C}^n)\) denotes the group of biholomorphisms preserving the standard Hermitian metric.
We note the following property:
Lemma 11
The fundamental group of a compact, complete flat affine manifold is an affine crystallographic group.
Among the crystallographic groups, one encounters the Bieberbach groups, which are torsion-free crystallographic groups.
This affine structure leads to interesting algebraic properties on the tangent bundle/tangent sheaf. Let \( \Gamma(TM)\) denote the space of vector fields on a given manifold \( \mathcal{M}\) .
According to the previous chapters, a flat and torsionless connection satisfied the following. An affine connection \( \nabla\) is torsion-free (or torsionless) if
(72)
The connection is called flat if
(73)
Such a connection defines a covariant differentiation for vector fields \( X, Y \in \Gamma(TM)\) :
Exercise 39
A pre-Lie algebra is an algebra satisfying the relation
Show that one can recover the structure of a pre-Lie algebra on the space of vector fields on \( \mathcal{M}\) , by putting \( X \circ Y := \nabla_X(Y)\) for an affine flat, torsionless connection \( \nabla\) .
Example 22
Let \( V\) be a finite-dimensional real Euclidean space endowed with a real inner product, and let \( C\) be a closed convex cone \( C \subset V \) with its vertex at the origin. The polar of the cone is defined by
A symmetric cone satisfies \( C^* = C \) .
Such symmetric cones carry an affine flat structure. The tangent sheaf of this cone is equipped with the structure of a pre-Lie algebra, defined by the operation \( X \circ Y := \nabla_X(Y) \) , where \( X, Y \) are vector fields on \( C\) .
Let us consider two well-known structures in topology: the torus and the Klein bottle. These topological objects are the only compact \( 2\) -dimensional manifolds that admit Euclidean structures.
Assume \( \mathcal{M}\) is a closed manifold different from the torus or Klein bottle. Then there exists no affine structure on it. This is a direct consequence of the following result:
Theorem 23 (Benzecri, 1955)
A closed surface admits affine structures if and only if its Euler characteristic vanishes.
When dealing with manifolds of dimension greater than two, there is no definitive criterion for determining the existence of an affine structure.
In particular, according to Smillie’s theorem, a closed manifold does not admit an affine structure if its fundamental group is built up out of finite groups by taking free products, direct products, and finite extensions. Specifically, a connected sum of closed manifolds with finite fundamental groups admits no affine structure. This result provides a profound insight into the interplay between algebraic and geometric properties in the study of such manifolds.
Certain Seifert fiber spaces also do not possess affine structures. This is clarified by the following statement:
Proposition 7 (Y. Carrière, F. Dal’bo, G. Meigniez)
Let \( \mathcal{M}\) be a Seifert fiber space with vanishing first Betti number. Then, \( \mathcal{M}\) does not admit any affine structure.
We discuss the complex case. Due to works of Kobayashi [22], a classification has been established.
Theorem 24
In the complex case, one has the following collection of compact complex manifolds that admit affine structures.
However, it is interesting to note that there is no other compact complex surface admitting even holomorphic affine connections.
The pre-Frobenius manifolds exhibit interesting relations to the Monge–Ampère domains. Before we outline such relations, we recall the definition of such manifolds, using the tools that have been previously introduced.
Let us consider an affine flat structure on a manifold \( \mathcal{M}\) . To define such a structure, several ingredients are required:
A symmetric tensor of rank \( 3\) , denoted as:
We define a multiplication operation \( \circ\) on the tangent sheaf \( \mathcal{T}_\mathcal{M}\) .
Define a bilinear symmetric multiplication \( \circ = \circ_{A, g}\) on the tangent sheaf \( \mathcal{T}_\mathcal{M}\) as follows:
such that:
where the prime denotes partial dualization.
A compatibility relation between the rank-\( 3\) tensor \( A\) , the rank-\( 2\) tensor \( g\) , and the multiplication operation \( \circ\) is given by:
This invariance of the metric with respect to multiplication ensures that the structure is well-defined.
Definition 105
A pre-Frobenius manifold is a manifold \( \mathcal{M}\) equipped with the above properties.
Certain additional requirements on the algebraic structure of the tangent sheaf \( (\mathcal{T}_\mathcal{M}, \circ)\) lead to having a Frobenius manifold.
There are however two important axioms to keep under consideration and that we shall express below.
An important axiom to have is the one of potentiality. Namely, this axiom requires the existence of a family of local potentials \( \Phi\) , such that:
This axiom is particularly important regarding the relations to the Monge–Ampère equations.
If \( \mathscr{D}\) is a strictly convex bounded subset of \( \mathbb{R}^n\) then for any nonnegative function \( f\) on \( \mathscr{D}\) and continuous \( \tilde{g}:\partial \mathscr{D} \to \mathbb{R}^n\) there is a unique convex smooth function \( \Phi\in C^{\infty}( \mathscr{D})\) such that
(74)
in \( D\) and \( \Phi=\tilde{g}\) on \( \partial \mathscr{D}\) .
An elliptic Monge–Ampère equation domain refers to the geometric data generated by \( (\mathscr{D}, \Phi)\) , where
such that Eq. (74) is satisfied.
We state the following result:
Theorem 25
A potential pre-Frobenius manifold satisfies everywhere locally the Monge–Ampère equation. In other words, a potential pre-Frobenius manifold can be identified with an (elliptic) Monge–Ampère domain.
Exercise 40
Make a proof of the statement above.
An associative pre-Frobenius manifold is a pre-Frobenius manifold such that there exists an associativity property
where \( X,Y,Z\) are vector fields. We will discuss this axiom fully in the context of Frobenius manifolds, below. A Frobenius manifold is a pre-Frobenius manifold where the axioms of potentiality and associativity both hold.
To derive the Witten-Dijkgraaf-Verlinde-Verlinde (WDVV) equation, let us rewrite the associativity of the multiplication \( \circ\) :
This yields a non-linear system of associativity equations, which are partial differential equations for the potential \( \Phi\) . For all \( a, b, c, d\) , these equations are written as:
These equations are highly non-linear and of third order.
Let \( \mathcal{M}\) be a manifold. A Frobenius algebra \( (\mathcal{A}, \circ)\) over a field \( \mathbb{K}\) is a commutative, associative, and unital algebra with a multiplication operation \( \circ\) equipped with a symmetric bilinear form \( \langle -, - \rangle\) satisfying:
for all \( x, y, z \in \mathcal{A}\) .
Definition 106
A Frobenius manifold is an associative potential pre-Frobenius manifold.
A manifold \( \mathcal{M}\) admits the structure of a Frobenius manifold if:
The Euler field \( E\) belongs to the class of affine vector fields. Its existence is inherently tied to the affine structure on \( \mathcal{M}\) . This can be observed through the equivalence of the following statements:
For all vector fields \( Y, Z\) on \( \mathcal{M}\) :
The coefficients of \( E\) are affine functions. Writing \( E = \sum_m E^m \partial_m\) , we have:
where \( a^m_j\) and \( b^m\) are constants in \( \mathbb{R}\) .
Thus, we propose a more concise definition of Frobenius manifolds based on Frobenius bundles. This new approach offers a practical and geometrical perspective.
Remark 22
If we choose local flat coordinates \( (x^a)\) and the corresponding local basis of tangent fields \( \partial_a\) , then:
and the compatibility of \( \Phi\) and \( g\) implies:
where
Here,
with \( g_{ab}\) interpreted as the inverse metric tensor.
Consider a pre-Frobenius manifold given by a triple \( (\mathcal{M}, g, A)\) . Define the following geometrical objects:
A connection:
where \( \nabla_0\) is determined by the condition that flat fields are \( \nabla_0\) -horizontal.
A pencil of connections depending on a parameter \( \lambda\) :
This is called the structure connection of \( (\mathcal{M}, g, A)\) .
Theorem 26 (Manin)
Let \( (M, g, A)\) be a pre-Frobenius manifold. Let \( \nabla_{\lambda}\) be the structure connection of a pre-Frobenius manifold \( (\mathcal{M}, g, A)\) . \( (\mathcal{M}, g, A)\) is a Frobenius manifold if and only if the pencil \( \{\nabla_\lambda\}\) is flat.
Exercise 41
Prove the Theorem 26.
We give a glimpse of how in statistical manifolds, it is possible to unravel the structures of a Frobenius algebra on the tangent space. This is only a short explanation that will be further developed in the next chapters, and serves only as a guide giving and intuition behind the construction.
Let \( \bar{T} = T \cdot g^{-1}\) denote the mixed \( (1, 2)\) tensor of third rank (that is the 1 contravariant and 2 covariant tensor). In components, this is defined by:
where \( g\) is the metric tensor, compatible with the affine connection on the manifold under consideration.
We illustrate the construction of the operation \( \circ\) , defined on \( \mathcal{T}_\mathsf{S}\) , where \( \mathsf{S}\) is a statistical manifold of exponential type and of finite dimension.
Generally:
where \( \{a^i\}\) form a dual basis to \( \{\partial_j\ell_{\theta}\}\) . The hidden multiplication structure is explained by the following theorem:
Theorem 27
The tensor \( \bar{T}\) defines a multiplication \( \circ\) on \( \mathcal{T}_{P_{\theta}} \mathsf{S}\) , as follows:
and for \( u, v \in \mathcal{T}_{P_{\theta}}\mathsf{S}\) :
The following lemma aids in understanding this hidden multiplication structure:
Lemma 12
For \( u, v, w \in \mathcal{T}_{P_{\theta}}\mathsf{S}\) :
We sumarize the key elements existing for pre-Frobenius manifolds, in the example of statistical manifolds. This is as follows:
A multiplication defined by:
where locally,
Metric invariance:
We take as our point of departure the classical notion of a semisimple Frobenius manifold, formulated within the established framework. Semisimple Frobenius manifolds are interesting in relation to configuration spaces and Saito spaces.
This preliminary exposition, serves as a scaffolding upon which a more intrinsic and geometrically transparent reformulation will be erected. The latter will emerge naturally from a careful reconsideration of the underlying structures, unveiling with clarity the interplay between the metric, the associative multiplication, and the coherence conditions that bind them into a unified whole.
Let \( (\mathcal{M}, g, A)\) be a triple, where \( \mathcal{M}\) is an associative pre-Frobenius manifold of dimension \( n\) . We introduce the following definition, which will play a fundamental role in the structure theory of such manifolds:
Definition 107
The manifold \( \mathcal{M}\) is said to be semisimple (or split semisimple) if there exists an isomorphism of sheaves of \( \mathcal{O}_M\) -algebras
Here, \( \circ\) denotes the multiplication on \( \mathcal{T}_\mathcal{M}\) , while \( \cdot\) represents componentwise multiplication in \( \mathcal{O}_M^n\) . The isomorphism is required to exist everywhere locally (or globally).
Let \( (e_1, e_2, \dots, e_n)\) be a local basis of \( \mathcal{T}_\mathcal{M}\) . In this local basis, the multiplication takes the form:
In the simplest case, this reduces to
Thus, the basis \( (e_i)\) provides a well-defined (up to renumbering) family of idempotents. If \( \mathcal{M}\) is semisimple, there exists an unramified covering of \( \mathcal{M}\) (of degree at most \( n!\) ) on which the induced pre-Frobenius structure becomes a splitting structure.
Let \( (e_i)\) be a local coordinate basis and let \( (\epsilon^i)\) be its dual basis i.e. \( 1\) -forms. The structure tensor \( A\) , encoding the pre-Frobenius structure, is given by the condition:
where \( g\) is the flat metric and \( \circ\) denotes the associative product on \( \mathcal{T}_\mathcal{M}\) . Since the basis vectors satisfy \( e_i \circ e_j = \delta_{ij} e_i\) , it follows, by a direct application of the above identity, that
We introduced the notation \( \eta_i\) , for the diagonal components of the metric, so that \( \eta_i=g(e_i, e_i)\) .
With this notation, the three-tensor \( A\) takes the form:
exhibiting its diagonalizability in a chosen coordinate system.
Finally, considering the identity element \( e = \sum_{i=1}^n e_i\) in \( (\mathcal{T}_\mathcal{M}, \circ)\) , we obtain the corresponding co-identity:
As a consequence of the above discussion, we are able to reformulate our initial definition differently.
Definition 108
A semisimple Frobenius structure on a smooth manifold \( \mathcal{M}\) consists of the following data:
Remark 23
It is interesting to note that while the conditions of potentiality and flatness imposed on \( g\) are of a non-trivial nature, the associativity of the product structure on \( \mathcal{T}_\mathcal{M}\) follows automatically under these hypotheses.
We are now in a position to articulate a fundamental characterization of Frobenius structures within this formalism. This characterization, which encapsulates the essential interplay between the multiplication, the metric, and their compatibility conditions, will serve as a guiding principle in the subsequent development of the theory.
Theorem 28
The semisimple pre-Frobenius structure on \( \mathcal{M}\) defines a Frobenius structure if and only if:
The function \( \eta\) is referred to as the potential metric of the structure. This metric corresponds to a Hessian metric of the form:
where \( \Phi\) is the potential. Note that the canonical coordinates \( \{u^i\}\) are defined up to renumbering and constant shifts.
Proof
Consider the structure connection of a pre-Frobenius manifold, \( \nabla_\lambda\) . According to the previous theorem (cf. Theorem26), the manifold \( \mathcal{M}\) is Frobenius if and only if the curvature \( \nabla_\lambda^2\) vanishes. This is equivalent to the satisfying the following expression:
(75)
Since \( \mathcal{M}\) is assumed to be associative and \( g\) is flat, we only need to consider the \( \lambda\) -linear terms in the above equation 75. Let \( \{e_i\}\) be a basis, and let \( \Gamma_{ik}^j\) denote the coefficients of the Riemannian connection:
(76)
Since the structure connections are given by \( \nabla_{\lambda, X}(Y) = \nabla_{0, X}(Y) + \lambda X \circ Y\) , the left-hand side of \( 75\) produces the \( \lambda\) -term:
(77)
Inputting Eq. 76 in Formula 77, we get that the \( \lambda\) -term reduces to:
(78)
Now consider the Lie bracket \( [e_i, e_j] = \sum\limits_q f_{ij}^q e_q\) . The \( \lambda\) -term in the right-hand side of 75 becomes:
The coefficients of \( e_k\) vanish in Eq.78. If \( \mathcal{M}\) carries a Frobenius structure then the equality in 75 holds so that \( f_{ij}^k=0\) .
Thus, the basis elements \( e_i\) pairwise commute, and local canonical coordinates \( \{u^i\}\) exist.
For the Levi-Civita connection of the metric \( g = \sum g_{ij} du^i du^j\) , the connection coefficients are given by:
where:
For the metric \( g = \sum \eta_i (du^i)^2\) , the non-vanishing coefficients are:
Therefore,
and
(79)
Finally, the vanishing of the \( \lambda\) -terms implies the following fundamental identity, valid for all indices \( i, j, k\) :
(80)
Applying directly Eq.79 one observes that that Equation 80 is identically satisfied for \( i=j\) , as well as in the case \( i \neq j \neq k \neq i\) . However, when considering the particular case \( i \neq j = k\) , one obtains the relation
By symmetry, the same condition must hold for \( k = i \neq j\) , leading to the conclusion that
for some \( \eta\) , defined at least locally. This verifies the required condition in all cases and thus establishes the result.
We now turn to an analysis of the geometric properties of the pencil of connections \( \nabla_\lambda\) , particularly in relation to the structures of a pre-Frobenius and Frobenius manifold.
Let \( \nabla_\lambda\) denote the structure connection associated with the pre-Frobenius manifold \( (\mathcal{M}, g, A)\) . The curvature of this connection satisfies a quadratic relation in the parameter \( \lambda\) , taking the form:
where \( R_1, R_2, R_3\) are curvature terms determined by the pre-Frobenius structure. A key observation is that the term \( R_3 \) coincide with \( \lambda_0^2\) which satisfies \( \lambda_0^2 = 0\) , leading to the simplification:
This relation gives rise to the following fundamental theorem, characterizing the Frobenius condition in terms of the vanishing of specific curvature terms.
Theorem 29
Let \( \nabla_\lambda\) be the structure connection associated with the pre-Frobenius manifold \( (\mathcal{M}, g, A)\) . Then:
Thus, the manifold \( (M, g, A)\) is Frobenius atisfies the full Frobenius condition if and only if the pencil of connections \( \nabla_\lambda\) is flat.
Exercise 42
Write a proof.
In this subsection, we discuss the construction of flat \( 3\) -webs via semisimple \( 3\) -dimensional Frobenius manifolds and provide a geometric interpretation of the Chern connection associated with these webs. We show that these webs are biholomorphic to the characteristic webs on the solutions of the corresponding associativity equations. Furthermore, these webs are hexagonal and possess at least one infinitesimal symmetry at each singular point.
Consider local trivial fibrations of class \( C^k\) . This is given by a triple \( (Y,X,\pi)=\lambda\) , where \( \pi:Y\to X\) is a projection of \( Y\) onto \( X\) and \( Y\) and \( X\) are differentiable (smooth) manifolds of respective dimensions \( m\) and \( n\) where \( m > n\) .
Given our interest in the local structure of such manifolds we can often think of them as connected domains of a Euclidean space of the same dimension.
If \( T_{p}(Y)\) is the tangent space of \( Y\) at a point \( {p}\) . A fibre \( F\) passing through \( {p}\) determines in \( T_{p}(Y)\) a subspace \( T_{p}(F_x)\) , of codimension \( r\) say, tangent to \( F_x\) at \( {p}\) . We provide \( T_{p}(Y)\) with a local moving frame \( \{e_i, e_\alpha; i=1,...,r; \alpha =r+1,...,m\}\) , where \( dim F =m - r\) and \( dim X =r\) . It is natural then to obtain a co-frame \( \{\omega^i, \omega^\alpha\}\) dual to \( \{e_i, e_\alpha\}\) , such that \( \omega^i(e_{\alpha})=0\) . Fibers of the fibration are integral manifolds of the system of equations
Since there exists a unique fibre \( F\) through a point \( {p}\in Y\) , the system above \( \{\omega^i=0, i=1,…, r\}\) is completely integrable. By the Frobenius theorem, the integrability condition is given by
where \( \phi^i_j\) are differential forms.
Assume \( Y\) is an \( m\) -dimensional manifold. A \( k\) -dimensional distribution \( \theta\) on an \( m\) -dimensional manifold \( Y\) , \( 0\leq k\leq m\) is a smooth field of \( k\) -dimensional tangential directions. To each point \( {p} \in Y\) there is a function which assigns a linear \( k\) -dimensional subspace of the tangent space \( T_{p}(Y)\) to \( {p}\) . These surfaces are called the leaves of the foliation. The numbers \( k\) is the dimension of the foliation.
Let \( Y=X\) be a differentiable manifold of dimension \( nr\) . We say that a \( d\) -web \( W(d,n,r)\) of codimension \( r\) is given in an open domain \( D\subset Y\) by a set of \( d\) foliations of codimension \( r\) which are in general position.
We begin by stating the following two theorems.
Theorem 30
Let the implicit cubic ODE
have a flat web of solutions and satisfy the regularity condition at \( m = (x_0, y_0, z_0) \in \mathcal{C} \subset \mathsf{S}\) . Then there exists a local diffeomorphism around \( \pi(m) = (x_0, y_0)\) that reduces the ODE to:
Theorem 31
Let the implicit cubic ODE
have a flat web of solutions and satisfy the regularity condition at \( m = (x_0, y_0, z_0) \in \mathcal{C} \subset \mathsf{S}\) . Then there exists a local diffeomorphism around \( \pi(m) = (x_0, y_0)\) that reduces the ODE to:
The concept of coordinates is foundational in geometry. In this section, we examine flat and canonical coordinates. Flat coordinates play a crucial role in the theory of Frobenius manifolds.
The problem of finding flat coordinates is essential in studying the Gauss–Manin systems, which are deeply connected to the differential equations for the integrals of basic differential forms over vanishing cycles associated with a given singularity. These systems also relate closely to the theory of primitive forms. The Gauss–Manin connection can be understood as a way to differentiate cohomology classes with respect to parameters.
Let \( (\mathcal{U}, x, g)\) be a triple, where \( \mathcal{U}\) is a domain in \( \mathbb{R}^n\) , \( x = (x^1, …, x^n)\) are local coordinates, and \( g\) is a metric. Let \( y = (y^1, …, y^n)\) be another system of coordinates. Since \( g\) is a symmetric tensor of rank \( 2\) , the metric can be written in both coordinate systems as:
We say that the coordinates are flat if the matrix associated with \( g\) is constant.
Consider the metric \( g\) , and let \( T_x \mathcal{M}\) denote the tangent space of the manifold \( \mathcal{M}\) at \( x\) . Let \( v \in T_x \mathcal{M}\) . Define the kernel of the metric \( g\) as
At any point \( x\) , there exist functions \( \Gamma_{ijk}\) and \( \Gamma_{ij}^k\) , defined as:
Then, the following theorem holds.
Theorem 32
For any \( i, j, k \) , the condition
(81)
is equivalent to the condition:
(82)
where \( v \in \ker(g) \) and \( \Gamma_{ki}^i = \Gamma_{ik}^i \) .
If the rank of \( g \) is constant and condition (82) holds, then there exists a smooth function \( \Gamma_{ki}^i(x) \) satisfying (81).
Proof
Fix a point \( x = (x^1, …, x^n)\) . Consider the system of equations
This is a linear system in the unknowns \( \Gamma_{ij}^l\) , \( \Gamma_{kj}^l\) , with \( g_{kl}\) , \( g_{il}\) , and \( \frac{\partial g_{ij}}{\partial x^k}\) as coefficients. Let \( A\) be the coefficient matrix, \( y\) the vector of unknowns, and \( b\) the vector of constants. The system has a solution if and only if, for every vector \( a\) such that \( a^\top A = 0\) , it holds that \( a^\top b = 0\) . This leads to:
From this, we deduce that:
Finally, we have the following theorem characterizing the existence of flat coordinates.
Theorem 33
Flat coordinates for the metric \( g\) exist if and only if there exist smooth functions \( \Gamma_{ij}^k(x)\) , symmetric in \( i\) and \( j\) , satisfying:
The compatibility condition:
The vanishing of the Riemann curvature tensor:
where
Exercise 39. Using the relations in (72) and (83), it is easy to see that one can obtain the structure of a pre-Lie algebra on the tangent sheaf.
Exercise 40 is easily shown by using the paper of Calabi (1954)[23] and the definition of a potential pre-Frobenius manifold.
Exercises 41 and 42. The proof proceeds in two stages. Part 1.
We calculate the coefficient of the \( \lambda\) term, \( R_1\) , in the following expression:
It follows immediately that \( R_1 = 0\) if and only if, for any \( a, b, c, d\) and for a mixed \( (1, 2)\) rank tensor \( A_{bc}^e\) ,
or equivalently, for a rank-3 covariant tensor:
When \( A\) is a potential, the symmetry of the rank-3 tensor \( A(X, Y, Z)\) , written as \( A(X, Y, Z) = (XYZ)\Phi\) , ensures that the above condition holds.
Suppose that the relation \( \partial_a A_{abc} = (-1)^{ab} \partial_b A_{aac}\) is true. Then, for all \( c, d\) , the form \( \sigma_b dx^b A_{bcd}\) is closed. Locally, we can find functions \( B_{cd}\) satisfying \( B_{cd} = (-1)^{cd} B_{dc}\) . Taking into account the symmetry of the rank-3 covariant tensor \( A\) , we obtain:
Hence, for any \( d\) , the expression \( \sigma_c dx^c B_{cd}\) is closed. Analogously, we find locally \( B_{cd} = \partial_c C_d\) . Since \( C_d = \partial_d \Phi\) , it follows that \( A_{bcd} = \partial_b \partial_c \partial_d \Phi\) .
This completes the first part of the proof.
Part 2. We compute the coefficient of the \( \lambda^2\) term in \( [\nabla_{\lambda, X}, \nabla_{\lambda, Y}](Z)\) :
If the multiplication \( \circ\) is associative, then \( R_2 = 0\) because \( \circ\) is always commutative.
Conversely, if \( R_2 = 0\) , then:
Thus, associativity of \( \circ\) follows. This completes the proof of the theorem.
In this chapter, and the following one, we reveal the intricate (and often hidden!) geometric structures underlying statistical manifolds. In particular, we revisit certain geometric concepts that have long been overlooked, such as \( m\) -pairs, and demonstrate their relevance in this context. By bringing these ideas back into focus, we aim to provide a deeper understanding of the rich interplay between geometry and statistical structures.
Let us consider an object \( X_{d}\) , a \( d\) -dimensional surface residing in an \( n\) -dimensional projective space \( \mathbb{P}^{n}\) , where the constraint \( d \leq n\) holds.
Definition 109
We say that the surface \( X_{d}\) is normalized if, for every point \( p \in X_{d}\) , we associate two distinct hyperplanes:
Example 23
Consider the case where \( d = 2\) and \( n = 3 \) . This means we have a 2-dimensional surface \( X_2\) within the 3-dimensional projective space \( \mathbb{P}^{3}\) .
At a point \( p\) on \( X_2 \) :
This embodies a duality, intrinsic to projective geometry. Notably, in the case where \( d = n\) , the hyperplane \( P_{I}\) identifies to the point \( p\) , and \( P_{II}\) becomes the \( (n-1)\) -dimensional surface devoid of the point \( p\) . This situation reflects the classical notion of duality in projective spaces, leading to the identification of \( X_{n}\) with the projective space \( \mathbb{P}^{n}\) .
Definition 110
We define an \( m\) -pair as a pair constituted of an \( m\) -plane and an \( (n-m-1)\) -plane.
More precisely, an m-pair is a pair consisting of:
These two planes are typically considered in the context of projective geometry or linear algebra, where they may satisfy certain geometric or algebraic relationships.
Exercise 43
Draw an \( m\) -pair in the low dimensional cases.
In projective geometry, an \( m \) -pair can be used to describe configurations of points, lines, and planes at infinity.
The concept of \( m \) -pairs is closely related to Grassmannians, which are spaces that parameterize all \( m \) -dimensional subspaces of an \( n\) -dimensional space. The study of \( m \) -pairs can be seen as a way to explore the relationships between different Grassmannians.
Exercise 44
Determine the relation between Grassmanianns and \( m\) -pairs.
In computer vision, \( m\) -pairs can be used to model the relationship between different views of a scene. For example, in structure from motion, the relationship between 2D image planes (2-planes) and 3D space (3-planes) can be analyzed using the concept of \( m \) -pairs.
Exercise 45
Generate an example using your favourite software \( m\) -pairs modelisin the relationship between different views of a scene.
The examples above illustrate how \( m \) -pairs can be applied in various contexts, from projective geometry to computer vision.
Normalized surfaces associated with an \( m\) -pair space possesses the following properties:
Lemma 13
Paracomplex numbers are a generalization of complex numbers, where instead of the imaginary unit i satisfying \( i^2 = -1 \) , the paracomplex unit \( \epsilon \) satisfies \( \epsilon^2 = 1 \) . The algebra of paracomplex numbers is defined as follows:
A paracomplex number is an element of the form:
where \( x, y \in \mathbb{R} \) are real numbers, and \( \epsilon \) is the paracomplex unit satisfying:
The set of all paracomplex numbers is denoted by \( \mathbb{P}\) .
The algebra of paracomplex numbers \( \mathfrak{P} \) is a two-dimensional commutative algebra over the real numbers \( \mathbb{R}\) . It is isomorphic to the direct sum \( \mathbb{R} \oplus \mathbb{R} \) , and its multiplication rule is given by:
Idempotent Basis: Paracomplex numbers can be expressed in terms of idempotent elements. Define:
These elements satisfy \( e_+^2 = e_+ \) , \( e_-^2 = e_- \) , and \( e_+ e_- = 0 \) . Any paracomplex number \( z = x + \epsilon y \) can be written as:
Conjugation: The paracomplex conjugate of \( z = x + \epsilon y \) is defined as:
Norm: The norm of a paracomplex number \( z = x + \epsilon y \) is given by:
Note that this norm is not positive definite, as it can take negative values.
Exercise 46
Show that \( (a+a\epsilon)\cdot(b-b\epsilon)=0\) .
A module over a paracomplex algebra generalizes the concept of a vector space, where the scalars are paracomplex numbers instead of real or complex numbers.
Let \( \mathfrak{P}\) be the algebra of paracomplex numbers. A module over \( \mathfrak{P}\) is an abelian group M together with a scalar multiplication:
satisfying the following properties for all \( z, z_1, z_2 \in \mathfrak{P}\) and \( m, m_1, m_2 \in M \) :
Paracomplex Vector Space: The simplest example of a module over \( \mathfrak{P}\) is \( \mathfrak{P}^n\) , the set of n -tuples of paracomplex numbers. Scalar multiplication is defined component-wise:
Decomposition into Real Submodules: Using the idempotent basis \( e_+ \) and \( e_- \) , any module \( M\) over \( \mathfrak{P} \) can be decomposed into two real submodules:
Here, \( e_+ M \) and \( e_- M \) are real vector spaces, and the action of \( \mathfrak{P}\) on \( M \) is determined by the actions of \( e_+ \) and \( e_- \) .
Modules over paracomplex algebras appear in various areas of mathematics and physics, including:
The algebra of paracomplex numbers has a remarkable incidence on the manifold of probability distributions. We discuss this in the following propositions and statements. This leads us to the following salient proposition:
Proposition 8
The space of \( 0\) -pairs within the projective space \( \mathbb{P}^{n}\) is isometric to the hermitian projective space over the algebra of paracomplex numbers.
Proof
Refer to section 4.4.5 of [24] for detailed proof.
Proposition 9
Let \( (X, \mathcal{F})\) be a finite measurable set with dimension \( n+1\) , where measures vanish exclusively on an ideal \( \mathcal{I}\) . Define \( \mathcal{H}_{n}\) as the space of probability distributions on \( (X, \mathcal{F})\) . It follows that the space \( \mathcal{H}_{n}\) embodies a manifold of \( 0\) -pairs.
Proof
The \( n\) -dimensional surface \( \mathcal{H}_{n}\) arises as the intersection of the hyperplane constrained by \( \mu(X) = 1\) and the cone \( \mathcal{C}_{n+1}\) of strictly positive measures within the affine space \( \mathcal{W}_{n+1}\) of signed bounded measures. It is interpreted as an \( n\) -dimensional surface within the projective space \( \mathbb{P}^{n}\) . The geometrical structure of this surface is thus inherited from projective geometry. By invoking the remark from the initial paragraph of section 0.4.3 in[24] alongside definition110 of \( 0\) -pairs, we conclude the correspondence with a manifold of \( 0\) -pairs.
Theorem 34
Consider \( (X, \mathcal{F})\) as a finite measurable set with dimension \( n+1\) , where measures vanish solely on an ideal \( \mathcal{I}\) . The space \( \mathcal{H}_{n}\) of probability distributions on \( (X,\mathcal{F})\) is isomorphic to the hermitian projective space over the cone \( M_{+}(2,\mathfrak{C})\) .
We introduce Gromov–Witten invariants for statistical manifolds (denoted GWS), extending an analog of classical Gromov–Witten invariants to the realm of information geometry. Originally, these invariants are rational numbers that enumerate (pseudo-)holomorphic curves satisfying specific conditions in a symplectic manifold. In our generalization, GWs encode fundamental geometric structures of statistical manifolds, reflecting the intersection theory of (para-)holomorphic curves within this framework.
Furthermore, this perspective reveals an intrinsic connection between the geometry of statistical learning and the dynamics of the learning process. The presence or obstruction of certain pseudo-holomorphic structures, as captured by GWs, provides a criterion for determining whether a learning system successfully acquires information or encounters fundamental limitations, thereby offering a novel geometric approach to the theory of learning.
Gromov–Witten invariants are fundamental numerical invariants in symplectic geometry and algebraic geometry, capturing intersection properties of (pseudo-)holomorphic curves in a given space.
Given a compact symplectic manifold \( (\mathcal{M},\varpi)\) , the Gromov–Witten invariant counts the number of (pseudo-)holomorphic maps
from a compact Riemann surface \( \mathscr{S}\) (with complex structure \( j\) ) into \( \mathcal{M}\) , satisfying certain constraints on their homology class and intersection conditions with given cycles.
The point of view that we adopt, here, is inspired from quantum cohomology, which is a formal Frobenius manifold.
Let us consider the (formal) Frobenius manifold \( (H,g)\) . We denote by \( k\) a field of characteristic 0 (such as \( \mathbb{C}\) or \( \mathbb{R}\) ). Let \( H\) be a \( k\) -module of finite rank and
an even symmetric pairing (which is non degenerate). We denote \( H^*\) the dual to \( H\) .
An important part of the Frobenius manifold structure is encoded in the existence of a potential function
which governs the multiplication structure on the manifold. In local coordinates, under suitable conditions, this function can be expressed as:
where \( Y_n\in (H^*)^{\otimes n}\) is a symmetric multilinear map
This system of multilinear forms defines a system of abstract correlation functions on the pair \( (H,g)\) , where \( H\) is a vector space equipped with a non-degenerate pairing \( g\) . These functions are symmetric and (in the context of Gromov–Witten theory) correspond to intersection numbers on the moduli space of stable maps. The Gromov–Witten invariants are generated from those multi-linear maps.
In the context of Gromov–Witten invariants, the potential function \( \Phi\) serves as the generating function for the intersection numbers of moduli spaces of holomorphic curves. The symmetric multilinear maps \( Y_n\) correspond to the correlation functions computed via topological field theory techniques.
The function \( \Phi\) satisfies the WDVV equations (associativity conditions on quantum cohomology), which govern the Frobenius manifold structure.
The maps \( Y_n\) define the higher-order correlation functions, whose values give the Gromov–Witten invariants.
This formulation provides a bridge between Frobenius manifolds, quantum cohomology, and Gromov–Witten theory, showing how the potential function encodes geometric intersection theory in an algebraic and formal power series framework. The abstract correlation functions \( Y_n\) serve as the structural foundation from which the Gromov–Witten invariants emerge, linking the geometry of moduli spaces with the algebraic structure of Frobenius manifolds.
In the framework of statistical geometry, we return to the study of statistical manifolds, emphasizing the discrete case of the exponential family. The fundamental relation governing this structure is given by the expansion:
(83)
where:
The family in (83) defines an analytic \( n\) -dimensional hypersurface within the statistical manifold, which can be uniquely determined by \( n+1\) points in general position.
We introduce the notion of Gromov–Witten invariants for statistical manifolds (GWS) as follows:
Definition 111
Let \( k\) be the field of real numbers. Let \( \mathsf{S}\) be the statistical manifold. The Gromov–Witten invariants for statistical manifolds (GWS) are defined from the family of multilinear maps:
Equivalently, these invariants may be written in terms of the generating expansion:
These invariants naturally emerge as part of the potential function \( \tilde\bf{ \Phi}\) , which is identified with the Kullback–Liebler entropy function of the statistical system. The entropy function itself is expressed in the form:
(84)
This formulation suggests a deeper geometric and categorical interpretation of statistical learning, where the intersection theory of statistical structures plays a fundamental role. Within this perspective, the entropy function \( \tilde\bf{ \Phi}\) governs the geometry of statistical families, much like the potential function in Frobenius manifolds or Gromov–Witten theory encodes intersection numbers in moduli spaces of holomorphic curves.
Therefore, we state the following:
Proposition 10
The entropy function \( \tilde\bf{ \Phi}\) of the statistical manifold is intrinsically determined by the Gromov–Witten invariants for statistical manifolds (GWS).
More precisely, \( \tilde\bf{ \Phi}\) arises as a generating function whose coefficients encode the multilinear maps \( \tilde{Y_n}\) , which define the GWS structure. These invariants characterize the underlying statistical geometry by capturing the intersection properties of statistical hypersurfaces.
Proof
Indeed, since \( \tilde\bf{ \Phi}\) , in formula (84) relies on the polylinear maps \( \tilde{Y}_n\in \left(-\sum_{j}\beta^jX_j(\omega)\right)^{\otimes n}\) , defining the (GWS), the statement follows.
We consider the tangent fiber bundle over the statistical manifold \( \mathsf{S}\) , where \( \mathsf{S}\) is the space of probability distributions. This bundle structure encodes the infinitesimal geometry of the statistical space, allowing us to describe variations in probability distributions in terms of a Lie group action.
The tangent fiber bundle is denoted by the quintuple \( (T\mathsf{S},\mathsf{S},\pi,G,F)\) , where:
For any point \( \rho\in \mathsf{S}\) , the tangent space at \( \rho\) , denoted \( T_{\rho}\mathsf{S}\) , is given by:
This identification follows from the fact that infinitesimal perturbations of a probability distribution \( \rho\) can be described by signed measures that respect the probabilistic constraints imposed by \( \mathsf{S}\) . The ideal \( I\) of the \( \sigma\) -algebra corresponds to the subspace of measures that do not contribute to the variations in probability distributions, ensuring consistency with the underlying measure-theoretic structure.
The Lie group \( G\) acts freely and transitively on each fiber of \( T\mathsf{S}\) , meaning that every element of the fiber can be transformed into any other through the group action. The action is given by:
where:
This affine structure on the fibers implies that the action of \( G\) acts as a translation group, ensuring a well-defined parallel transport mechanism in the space of probability measures. Such a structure is crucial for describing information geometry, as it encodes how probability distributions evolve under statistical transformations.
Lemma 14
Consider the fiber bundle \( (T\mathsf{S},\mathsf{S},\pi,G,F)\) where:
Let path \( \gamma:\mathcal{I}\to \mathsf{S}\) be a smooth geodesic path in \( \mathsf{S}\) , where \( \mathcal{I}\subset \mathbb{R}\) . The fiber over \( \gamma\) is denoted by \( F_{\gamma}=\pi^{-1}(\gamma)\) , which represents the space of tangent vectors along \( \gamma\) .
Then, the fiber
consists of two disjoint connected components. Each component \( \gamma^+\) and \( \gamma^-\) is contained within a totally geodesic submanifold of \( T\mathsf{S}\) , denoted \( E^+\) and \( E^-\) , respectively.
Proof
Consider the fiber above \( \gamma\) . Since for any point of \( \mathsf{S}\) , its the tangent space is identified to module over paracomplex numbers. This space is decomposed into a pair of subspaces (i.e. eigenspaces with eigenvalues \( \pm \varepsilon\) ). The geodesic curve in \( \mathsf{S}\) is a path such that \( \gamma=(\gamma^i(t)): t\in [0,1]\to \mathsf{S}\) . In local coordinates, the fiber budle is given by \( \{\gamma^{ia}e_{a}\}\) , and \( a\in \{1,2\}\) . Therefore, the fiber over \( \gamma\) has two components \( (\gamma^+,\gamma^-)\) . Taking the canonical basis for \( \{e_1,e_2\}\) , implies that \( (\gamma^+,\gamma^-)\) lie respectively in the subspaces \( E^+\) and \( E^-\) . These submanifolds are totally geodesic in virtue of Lemma 3 in [19].
We define a learning process in terms of the Ackley–Hinton–Sejnowski method [20], which is based on minimizing the Kullback–Leibler divergence as a measure of distance between probability distributions. This process can be interpreted in a geometric framework as follows:
Proposition 11 (Geometric Formulation of Learning Process)
The learning process consists of determining whether there exist intersections between the paraholomorphic curve \( \gamma^+ \) and the orthogonal projection of \( \gamma^- \) into the subspace \( E^+ \) .
More precisely, let \( \mathsf{S}\) be a statistical manifold equipped with a fiber bundle structure \( (T\mathsf{S}, \mathsf{S}, \pi, G, F) \) . Consider a geodesic path \( \gamma: I \to \mathsf{S} \) , parametrized by an interval \( I \subset \mathbb{R}\) . The fiber over \( \gamma\) , denoted by \( F_{\gamma}\) , is assumed to decompose into two connected components:
Each component \( \gamma^+ \) and \( \gamma^- \) is contained in a totally geodesic submanifold of \( TS \) , denoted by \( E^+ \) and \( E^- \) , respectively:
In particular, the learning process is considered successful whenever the distance between the geodesic \( \gamma^+ \) and its orthogonal projection into \( E^+ \) decreases towards zero. That is, the process converges if:
This formulation provides a rigorous geometric criterion for assessing the success of learning, leveraging the underlying differential geometry of the statistical manifold.
In other words:
Proposition 12
The learning process consists in determining if there exist intersections of the paraholomorphic curve \( \gamma^+\) with the orthogonal projection of the curve \( \gamma^-\) in the subspace \( E^+\) .
More formally, as was depicted in [25] (sec. 3) let us denote by \( \Upsilon\) the set of (centered) random variables over \( (\Omega,\mathcal{F},P_{\theta})\) which admit an expansion in terms of the scores under the following form:
By direct calculation, one finds that the log-likelihood \( \ell= ln\rho\) of the usual (parametric) families of probability distributions belongs to \( \Upsilon_p\) as well as the difference \( \ell -\ell^*\) of log-likelihood of two probabilities of the same family.
Being given a family of probability distributions such that \( \ell \in \Upsilon_P\) for any \( P\) , let \( \mathcal{U}_P\) , let us denote \( P^*\) the set such that \( \ell-\ell^*\in \Upsilon_p\) . Then, for any \( P^*\in \mathcal{U}_p\) , we define \( K(P,P^*)=\mathbb{E}_P[\ell - \ell^*]\) .
Theorem 35
Let \( \mathsf{S}\) be statistical manifold, equipped with a Riemannian metric and an affine connection. Then, the (GWS) determines the evolution of the learning process, through the associated geometric constraints.
Proof
Whenever there is a successful learning, the distance between the curve \( \gamma^+\) and the projection of \( \gamma^-\) on \( E^+\) tends to be as small as possible. This implies that \( K(P,P^*)=\mathbb{E}_P[\ell - \ell^*]\) , so that \( K(P,P^*)\) is minimized.
The learning process is by definition given by a deformation of a pair of geodesics, defined respectively in the pair of totally geodesic manifolds \( E^+, E^-\) . The (GWS), arise in the \( \tilde{Y}_n\) in the potential function \( \tilde\bf{ \Phi}\) , which is directly related to the relative entropy function \( K(P,P^*)\) . Therefore, it is easy to conclude that the (GWS) determine the learning process.
Similarly as in the classical (GW) case, the (GWS) count intersection numbers of the para-holomorphic curves generated by \( \gamma^+\) and \( \gamma^-\) . In fact, we have the following statement:
Corollary 3
Let \( (T\mathsf{S},\mathsf{S},\pi,G,F)\) be the fiber bundle above, where:
Let \( \gamma^-\subset T\mathsf{S}\) be a geodesic in the tangent bundle with respect to the affine connection and let \( \gamma^+\subset E^+\) be a geodesic in the sub-bundle \( E^+\subset T\mathsf{S}\) . Then, the (GWS) determine the number of intersections of the projection of \( \gamma^-\) onto \( E^+\) , with the geodesic \( \gamma^+\) .
[1] Kuratowski, K. Introduction to set theory and topology. Oxford: Pergamon Press; Warszawa: PWN - Polish Scientific Publishers (1972).
[2] Lang, S. Differential and Riemannian manifolds. New York: Springer-Verlag (1995).
[3] Lang, S. Fundamentals of differential geometry. Graduate Texts in Mathematics. New York: Springer-Verlag (1999).
[4] Milnor, J. W. Topology from the differentiable viewpoint. Charlottesville: University Press of Virginia (1965).
[5] Kobayashi, S.; Nomizu, K. Foundations of differential geometry. Wiley Classics Library Edition. New York: John Wiley & Sons (1996).
[6] Sikorski, R. Differential modules. Colloquium Mathematicum, 24(1), 45–79 (1971). %
[7] Kashiwara, M.; Schapira, P. Sheaves on Manifolds Grundlehren der mathematischen Wissenschaften 292, Springer-Verlag, (1990).
[8] Billingsley, P. Probability and Measure. 3rd Edition. Wiley Series in Probability and Mathematical Statistics. New York: Wiley (1995).
[9] Parthasarathy, K. R. Probability measures on metric spaces. New York: Academic Press (1967).
[10] Feller, W. An introduction to probability theory and its applications. New York: John Wiley & Sons (1966).
[11] Amari, S. Differential-geometrical methods in statistics. Lecture Notes in Statistics, 28. Berlin: Springer-Verlag (1985).
[12] Amari, S. Information Geometry. In Geometry and Nature, eds. J.-P. Bourguignon, H. Nencka. Contemporary Mathematics, 203, 81–95. Providence, RI: American Mathematical Society (1997).
[13] Morozova, E. A.; Chentsov, N. N. Markov invariant geometry on state manifolds. Journal of Soviet Mathematics, 56(5), 2648–2669 (1991; Russian original 1989).
[14] Morozova, E. A.; Chentsov, N. N. Projective Euclidean geometry and noncommutative probability theory. Trudy Matematicheskogo Instituta imeni V. A. Steklova, 196, 105–113 (1991); Proceedings of the Steklov Institute of Mathematics, 196, 117–127 (1992).
[15] Morozova, E. A.; Chentsov, N. N. Natural geometry of families of probability laws. Itogi Nauki i Tekhniki. Seriya Sovremennye Problemy Matematiki. Fundamental'nye Napravleniya, 83, 133–265 (1991).
[16] Dubrovin, B. Integrable systems and quantum groups. Lecture Notes in Mathematics, 1620, 120–348 (1993).
[17] Manin, Yu. I. Frobenius manifolds, quantum cohomology, and moduli spaces. Colloquium Publications, 47. Providence, RI: American Mathematical Society (1999).
[18] Combe, N. C.; Manin, Yu. I. F-manifolds and geometry information. Bulletin of the London Mathematical Society, 52(5), 777–792 (2020).
[19] Combe, N. C.; Combe, Ph.; Nencka, H. Statistical manifolds & hidden symmetries. Geometric Science of Information pp.565–573, Springer (2021).
[20] Ackley, D.; Hinton, G.; Sejnowski, T. Learning algorithm for Boltzmann machine. Cognitive Science, 9, 147–169 (1985).
[21] Combe, N. C. Geometric Classification of Real Ternary Octahedral Quartics Discrete Comput. Geom. Volume 60 (2), pages 255–282, (2018).
[22] Kobayashi S. On holomorphic connections in Global Differential Geometry and Global Analysis Edited by A. Dold and B. Eckmann, 838, Proceedings of the Colloquium Held at the Technical University of Berlin, November 21–24, (1979).
[23] Calabi, E. On the space of Kähler metrics, Proceedings of the National Academy of Sciences, 40(8), 759–760, (1954)
[24] Rozenfeld, B. Geometry of Lie groups. Dordrecht: Springer-Science+Business Media (1997).
[25] Burdet, G.; Combe, Ph.; Nencka, H. Statistical manifolds, self-parallel curves and learning processes. In Seminar on Stochastic Analysis, Random Fields and Applications (Ascona, 1996), 87–99. Progress in Probability, 45. Basel: Birkhäuser (1999).