If you see this, something is wrong
To get acquainted with the document, the best thing to do is to select the "Collapse all sections" item from the "View" menu. This will leave visible only the titles of the top-level sections.
Clicking on a section title toggles the visibility of the section content. If you have collapsed all of the sections, this will let you discover the document progressively, from the top-level sections to the lower-level ones.
Generally speaking, anything that is blue is clickable.
Clicking on a reference link (like an equation number, for instance) will display the reference as close as possible, without breaking the layout. Clicking on the displayed content or on the reference link hides the content. This is recursive: if the content includes a reference, clicking on it will have the same effect. These "links" are not necessarily numbers, as it is possible in LaTeX2Web to use full text for a reference.
Clicking on a bibliographical reference (i.e., a number within brackets) will display the reference.
Speech bubbles indicate a footnote. Click on the bubble to reveal the footnote (there is no page in a web document, so footnotes are placed inside the text flow). Acronyms work the same way as footnotes, except that you have the acronym instead of the speech bubble.
By default, discussions are open in a document. Click on the discussion button below to reveal the discussion thread. However, you must be registered to participate in the discussion.
If a thread has been initialized, you can reply to it. Any modification to any comment, or a reply to it, in the discussion is signified by email to the owner of the document and to the author of the comment.
First published on Friday, Jul 4, 2025 and last modified on Friday, Jul 4, 2025 by François Chaplais.
Mathematical researchers, especially those in early-career positions, face critical decisions about topic specialization with limited information about the collaborative environments of different research areas. The aim of this paper is to study how the popularity of a research topic is associated with the structure of that topic’s collaboration network, as observed by a suite of measures capturing organizational structure at several scales. We apply these measures to 1,938 algorithmically discovered topics across 121,391 papers sourced from arXiv metadata during the period 2020–2025. Our analysis, which controls for the confounding effects of network size, reveals a structural dichotomy–we find that popular topics organize into modular “schools of thought,” while niche topics maintain hierarchical core-periphery structures centered around established experts. This divide is not an artifact of scale, but represents a size-independent structural pattern correlated with popularity. We also document a “constraint reversal”: after controlling for size, researchers in popular fields face greater structural constraints on collaboration opportunities, contrary to conventional expectations. Our findings suggest that topic selection is an implicit choice between two fundamentally different collaborative environments, each with distinct implications for a researcher’s career. To make these structural patterns transparent to the research community, we developed the Math Research Compass (https://mathresearchcompass.com), an interactive platform providing data on topic popularity and collaboration patterns across mathematical topics.
The full source code, data, and reproducible analysis pipeline for this study are publicly available in our GitHub repository: https://github.com/brian-hepler-phd/MRC-Network-Analysis}
The structure of scientific collaboration networks shapes knowledge flow and the professional environments in which researchers build their careers. In mathematics, where collaboration patterns are highly varied, understanding these structural differences is important for both individual career navigation and institutional science policy [12].
The evolution of such networks is governed by two distinct classes of mechanisms. Endogenous mechanisms are processes internal to the network’s topology, such as preferential attachment, where a node’s existing connectivity drives the formation of new links [13]. In contrast, exogenous mechanisms are driven by factors external to the network structure, such as homophily based on shared attributes like research interests or institutional affiliation [14, 15]. While network formation theory distinguishes between these processes [16], empirically separating their effects in observational data is a well-known identification problem [17, 18, 19]. It is therefore challenging to determine whether an observed network property is a genuine structural pattern associated with an external factor or simply a predictable artifact of network size and growth.
This problem is particularly relevant in mathematics, where research is organized into numerous subfields with distinct collaborative norms [6, 20]. The popularity of a research area, a proxy for community size and collective attention, is a key exogenous variable likely to be associated with network structure. However, its influence is difficult to disentangle from the endogenous effects of network scale.
This paper addresses this challenge by applying a two-stage statistical analysis to the collaboration networks of 1,938 algorithmically-identified mathematical topics. We first perform a baseline comparison of network metrics between popular and niche topics. Second, we use regression models to control for network size, allowing us to distinguish size-dependent effects from size-independent structural patterns associated with popularity. This approach allows us to test which network properties are robustly associated with the exogenous popularity of a field, versus those that are better explained as endogenous scaling effects. While popularity itself may be influenced by network properties over long time scales, we treat it as a given characteristic of research areas for the purposes of this cross-sectional analysis.
Our analysis reveals a robust structural dichotomy: popular topics organize into modular "schools of thought," while niche topics maintain hierarchical core-periphery structures. We also document a "constraint reversal," where, contrary to initial expectations, researchers in popular fields face greater structural constraints after controlling for size. This approach identifies which structural patterns warrant further investigation through longitudinal or experimental methods to establish causal mechanisms.
This paper proceeds as follows. Section 2 details our methodology for topic identification, network construction, and the two-stage statistical analysis. Section 3 presents our empirical findings, documenting both the initial structural differences and the results after controlling for network size. Section 4 discusses implications for understanding mathematical collaboration environments and their relevance to career navigation. We conclude by acknowledging limitations and suggesting directions for future research linking network structures to career outcomes.
We analyzed the Cornell ArXiv dataset ([1]), which contains metadata for approximately 2.7 million scientific papers. Our analysis focused on mathematics papers published between 2020 and 2025, yielding 121,391 papers across 31 mathematical subfields as classified by the ArXiv Mathematics Subject Classification system.
Research topics were identified using BERTopic [21], a state-of-the-art topic modeling approach. The BERTopic pipeline consists of three main stages. First, it generates semantic vector embeddings for each document using a Bidirectional Encoder Representations from Transformers (BERT) model [22]. Next, it applies Uniform Manifold Approximation and Projection (UMAP) to these embeddings to reduce their dimensionality while preserving local and global structure [23]. Finally, it uses Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) on the dimension-reduced embeddings to identify dense clusters of documents, which constitute the research topics [24]. We concatenated paper titles and abstracts as input text, applied preprocessing to remove common stopwords and mathematical notation artifacts, and generated semantic embeddings using the all-MiniLM-L6-v2 sentence transformer model. UMAP reduced embeddings to 5 dimensions using 15 neighbors and cosine distance, followed by HDBSCAN clustering with a minimum cluster size of 15 papers. This process identified 1,938 distinct research topics, with each paper assigned to its most coherent topic based on the highest probability score. Each topic was assigned a primary mathematical category based on the most frequent ArXiv category among its constituent papers.
A prerequisite for constructing accurate collaboration networks is robust author name disambiguation (AND). Given that raw author strings from bibliographic sources often contain variations, initials, and ambiguities that can lead to either incorrectly splitting single authors into multiple nodes or incorrectly merging distinct authors into one, we implemented a dedicated multi-stage AND pipeline AuthorDisambiguator; see Appendix B for full algorithm details.
Our multi-stage process aligns with current best practices in the field, which have converged on sequential pipelines that combine graph-based similarity with network-aware clustering to achieve high precision on large-scale datasets ([2]; [3]). We adopted a particularly conservative approach, prioritizing high precision to minimize the introduction of false connections that could artificially distort network topology ([4]; [5]).
The pipeline began by parsing and normalizing all unique author name strings (\( N \approx 120,000\) ) from the 121,391 papers. Normalization involved lowercasing, diacritic removal, and standardization of particles and punctuation. Subsequently, a graph-based approach was employed:
This iterative process reduced the initial 120,767 unique raw author strings to 117,883 canonical author profiles, representing a 2.39
For each of the 1,938 research topics, we constructed an undirected co-authorship network using the disambiguated canonical author profiles generated by our AND pipeline. In these networks, nodes represent unique authors active in the topic, and edges represent a co-authorship relationship on at least one paper within that topic.
The construction process followed established protocols for creating collaboration networks from bibliographic data ([6]). For each paper with multiple authors, a complete graph (or clique) was formed among all co-authors. This means an edge was created between every pair of authors on a given paper.
Only papers with multiple authors contributed to the formation of network edges. However, papers with a single author were retained in our dataset for the calculation of topic-level metrics like the overall collaboration rate.
To characterize the structural properties of mathematical collaboration networks, we employ a carefully selected set of 10 network metrics organized into four domains. This captures distinct yet complementary aspects of network structure, providing a comprehensive structural signature of each mathematical subfield.
| Analysis Domain | Metric Name | Description |
| Collaboration Dynamics | Collaboration Rate | Proportion of multi-author papers, indicating overall teamwork propensity. |
| (Tie Strength & Persistence) | Repeated Collaboration Rate | Proportion of author pairs with multiple joint papers, measuring stability of research teams. |
| Global Topology | Degree Centralization | Concentration of collaborations around a few central "hub" researchers (hierarchy). |
| (Overall Network Shape) | Degree Assortativity | Tendency for highly-collaborative authors to work with other high-prolific authors. |
| Small-world Coefficient (\( \omega\) ) | Balance of high local clustering with short global path lengths for efficient diffusion. | |
| Robustness Ratio | Network’s resilience to the targeted removal of hubs versus random failures. | |
| Mesoscopic Organization | Modularity (Q) | Strength of a network’s division into distinct communities or "schools of thought". |
| (Community & Core Structure) | Coreness Ratio | Proportion of authors belonging to the network’s densely connected core. |
| Researcher Positioning | Average Constraint | Extent to which an individual’s connections are redundant (inverse of brokerage). |
| (Individual-level Opportunity) | Average Effective Size | Average number of non-redundant connections per researcher, measuring access to diverse information. |
This domain quantifies the intensity and persistence of collaborative relationships. Following the theory of tie strength by [35] and its recent experimental validation [25], we measure:
Collaboration rate, the proportion of papers with multiple authors, provides a baseline measure of a field’s propensity for teamwork. This metric captures the overall collaborative culture within mathematical subfields, following the foundational approach of [6].
Repeated collaboration rate, the fraction of collaborations that occur more than once between the same pair of researchers. Building on the operationalization of collaboration persistence by [36], this metric distinguishes between fields characterized by transient partnerships and those fostering stable research teams. The work of [37] on “super ties" demonstrates that such persistent collaborations create multiplicative effects on scientific productivity.
Together, these metrics reveal whether mathematical subfields operate through short-lived project-based teams or cultivate long-term collaborative relationships, with direct implications for knowledge accumulation and research continuity.
This domain captures macro-level structural properties that determine network efficiency and resilience. We employ four complementary metrics:
Degree centralization, following the framework of [38] as applied to scientific networks by [6], measures the extent to which collaborations concentrate around a few highly connected “hub" researchers. High centralization indicates hierarchical structures, while low centralization suggests more egalitarian patterns.
Degree assortativity measures whether highly collaborative researchers preferentially work with other highly prolific collaborators (assortative) [26]. As [39] demonstrate, this choice involves a critical trade-off: assortative structures tend to produce higher research output but lower novelty, while disassortative structures generate more innovative outcomes.
Small-world coefficient (\( \omega\) ), calculated as \( \omega = (C/C_{\text{random}})/(L/L_{\text{random}})\) [27], captures whether networks optimize both high local clustering for specialized knowledge development and short global path lengths for efficient knowledge diffusion-a balance [6] identified as characteristic of productive scientific fields.
Robustness ratio quantifies the network’s structural resilience by comparing its tolerance to the targeted removal of high-degree “hub" authors versus random node failures. A low ratio indicates a fragile, scale-free structure, providing crucial context for the stability of the collaborative landscape.
This domain reveals intermediate-scale organization, reflecting how research fields self-organize into communities and hierarchies:
Modularity (Q), using the modularity optimization approach of [40], quantifies the strength of a network’s division into research communities or “schools of thought." This is essential for understanding whether a field fragments into isolated clusters or maintains integrated collaborations.
Coreness ratio, building on the core-periphery framework of [41] and advances by [42], measures the proportion of researchers belonging to the network’s densely connected core. This reveals whether fields organize around expert-led hierarchies (high coreness) or distribute expertise more evenly.
This domain measures the average opportunity for individual researchers to access diverse information, based on the theory of structural holes [28, 29]:
Average constraint measures the extent to which an individual’s network is redundant. Following Burt’s formulation and recent clarifications [30], lower constraint indicates greater opportunity to bridge structural holes, which is strongly linked to higher productivity and innovation in scientific collaboration [31, 32].
Average effective size directly measures an individual’s brokerage potential by quantifying their number of non-redundant contacts. Higher values indicate that the typical researcher has access to diverse information sources—what [29] termed “vision of options otherwise unseen."
These ten metrics, spanning four complementary domains, provide a comprehensive structural signature that captures how mathematical research organizes across multiple scales. This multi-domain approach addresses limitations of single-metric analyses by capturing network properties at multiple scales–from individual positioning to global topology. By combining metrics of tie strength, macro-structure, meso-organization, and micro-positioning, we can identify distinct “collaboration phenotypes" that characterize different mathematical subfields and explain variations in their productivity, innovation, and knowledge diffusion patterns. See Appendix A for more computational details regarding these metrics.
Topics were classified by research popularity based on total paper count. We rank-ordered all 1,938 topics by the number of papers and designated the top 20 The 20
Our analysis proceeded in two stages to disentangle popularity effects from network size effects:
Stage 1: Baseline Comparisons. We first assessed distributional assumptions using Shapiro-Wilk tests for normality. Since all metrics violated normality assumptions (all \( p < 0.001\) ), we employed non-parametric statistical tests throughout. This non-normality is expected in network data due to bounded metrics (e.g., collaboration rates constrained to 0-1), power-law degree distributions common in academic networks, and heterogeneous collaboration structures across scientific disciplines ([7], [8], [9]).
Group comparisons used Mann-Whitney U tests to assess statistical significance. This non-parametric test operates by ranking all observations from both groups and calculating the U statistic, which is the number of times a value from one group is larger than a value from the other. This statistic is then used to compute a p-value against the null hypothesis that the two distributions are identical. This approach is robust to non-normal distributions and outliers common in network data.
To quantify the magnitudes of these differences, we calculated the non-parametric effect size Cliff’s delta (\( \delta\) ) ([10]), which estimates the degree of separation between two distributions as the difference in dominance probabilities (i.e., \( \delta = P(X > Y) - P(Y > X)\) ). For interpretation, we use standard thresholds of 0.147 (small), 0.33 (medium), and 0.474 (large), which represent the non-parameteric equivalents of Cohen’s original effect size benchmarks ([11]).
A Bonferroni correction was applied for the ten comparisons, setting our significance threshold at \( \alpha = 0.005\) . Confidence intervals for key estimates were computed using bootstrap resampling with 10,000 iterations.
Stage 2: Controlling for Network Size. To test whether observed differences persisted beyond size effects, we conducted multiple regression analyses. For each network metric, we fit three models:
All variables were standardized (mean=0, SD=1) before analysis to ensure comparability of coefficients across metrics. A significant popularity coefficient in the size control model indicates a robust effect that persists beyond network scale. We also examined correlations between network size and each metric to assess confounding strength.
All analyses were performed in Python 3.11 using SciPy 1.15.2 and statsmodels 0.14.0.
To ensure our findings were not artifacts of specific parameter choices, we conducted two comprehensive validation analyses for our topic modeling and classification approach.
First, to validate the BERTopic model itself, we performed a sensitivity analysis testing 72 different hyperparameter combinations on a 10,000-document subsample. This revealed that while the exact number of topics generated was moderately sensitive to the min_topic_size parameter (Coefficient of Variation = 0.44), the overall topic structure was reasonably stable (mean Adjusted Rand Index = 0.34 \( \pm\) 0.06). This analysis informed our choice of baseline parameters for the main study (min_topic_size=15, n_neighbors=15, n_components=5). Furthermore, to ensure the semantic coherence of the generated topics, we conducted a manual validation of a stratified sample of 100 topics. Appendix C (Table 9) presents representative examples from this validation, demonstrating that BERTopic successfully identified genuine mathematical research areas across all size ranges. The full validation found that 74 Second, and most crucially, we generated three additional complete topic models using alternative minimum topic size values (10, 20, and 25), resulting in four distinct topic sets ranging from 1,164 to 2,953 topics. We then re-ran our entire two-stage statistical analysis—both the baseline Mann-Whitney U tests and the full regression models—on all four topic sets (see Appendix C, Table 6). The central finding of our paper, the duality between scale-driven and popularity-driven network effects, remained consistent across all four independent analyses. The same metrics were robustly associated with popularity, and the same metrics were confounded by size in every single run (see Appendix C, Table 5 for full interaction model results across one run). This demonstrates that our conclusions reflect a fundamental phenomenon in the data, independent of the specific granularity of the topic model.
Our analysis encompassed 121,391 mathematics papers from which BERTopic identified 1,938 distinct research topics. The classification yielded 387 popular topics (mean = 314.2 \( \pm\) 345.1 papers) and 387 niche topics (mean = 13.8 \( \pm\) 1.7 papers), representing a 22.8-fold difference in mean paper counts. This extreme difference ensured distinct collaboration environments for comparison while maintaining balanced sample sizes.
Mann-Whitney U tests revealed significant differences in 9 of 10 network metrics between popular and niche topics (Table 1). All significant results survived Bonferroni correction (\( \alpha = 0.005\) ), with most metrics showing large effect sizes by conventional standards for Cliff’s delta.
Popular topics demonstrated substantially higher modularity (87.9 Niche topics exhibited contrasting organizational patterns characterized by hierarchical structures. These networks demonstrated exceptionally high coreness ratios (35.3 Only collaboration rate showed no significant difference between groups (75.5
| Metric | Popular Mean (SD) | Niche Mean (SD) | p-value | Cliff’s \( \delta\) | Effect Size |
| Collaboration Rate | 0.755 (0.148) | 0.747 (0.196) | 0.511 | -0.027 | Negligible |
| Repeated Collab. Rate | 0.148 (0.058) | 0.130 (0.143) | \( <0.001\) | 0.288 | Small |
| Degree Centralization | 0.062 (0.043) | 0.142 (0.111) | \( <0.001\) | -0.559 | Large |
| Degree Assortativity | 0.293 (0.226) | 0.496 (0.409) | \( <0.001\) | -0.337 | Medium |
| Modularity | 0.879 (0.081) | 0.688 (0.173) | \( <0.001\) | 0.793 | Large |
| Small World Coeff. | 84.1 (141.7) | 14.9 (17.0) | \( <0.001\) | 0.919 | Large |
| Coreness Ratio | 0.070 (0.064) | 0.353 (0.210) | \( <0.001\) | -0.928 | Large |
| Robustness Ratio | 0.326 (0.231) | 0.776 (0.215) | \( <0.001\) | -0.833 | Large |
| Avg. Constraint | 0.827 (0.081) | 0.918 (0.113) | \( <0.001\) | -0.557 | Large |
| Avg. Effective Size | 1.484 (0.271) | 1.185 (0.214) | \( <0.001\) | 0.700 | Large |
Network size correlations revealed substantial confounding for multiple metrics (Table 3). The strongest associations appeared for robustness ratio (\( r = -0.79\) ), coreness ratio (\( r = -0.73\) ), and small-world coefficient (\( r = 0.72\) ), confirming that a simple bivariate comparison is insufficient to isolate the effects of popularity. To visually illustrate our core finding, Figure 1 displays exemplar networks for a popular and a niche topic with identical author counts, highlighting the stark structural differences that persist even when size is held constant.
| Metric | Correlation with Size |
| Robustness Ratio | -0.785*** |
| Coreness Ratio | -0.733*** |
| Small World Coefficient | 0.722*** |
| Avg. Effective Size | 0.640*** |
| Modularity | 0.616*** |
| Avg. Constraint | -0.541*** |
| Degree Centralization | -0.443*** |
| Degree Assortativity | -0.323*** |
| Repeated Collab. Rate | 0.242*** |
| Collaboration Rate | 0.236*** |
Regression analysis controlling for network size revealed three distinct patterns in how popularity effects manifest (Table 4).
First, a core set of three metrics demonstrated robust popularity effects that persisted after size control. Modularity retained a substantial positive association with popularity (\( \beta = 0.55, p < 0.001\) ), though its effect magnitude decreased by 56
Second, a majority of metrics (six of ten) revealed effects that were either fully confounded by size or only marginally significant. For metrics like robustness ratio and degree assortativity, the initial strong associations with popularity vanished entirely after controlling for network size (\( p > 0.6\) ), indicating these are clear artifacts of scale.
More subtly, degree centralization (\( p=0.008\) ) and small-world coefficient (\( p=0.026\) ) showed \( p\) -values that, while not surviving our stringent Bonferroni correction, could be considered marginally significant. This suggests the possibility of weaker, secondary popularity effects on these properties, which may become more apparent with even larger datasets or in different scientific domains. For the conservative purposes of our main analysis, however, we classify these as primarily scale-driven phenomena.
Third, collaboration rate displayed a striking emergent pattern. While the simple comparison showed no significant difference between groups, controlling for network size revealed a strong negative association (\( \beta = -2.33, p < 0.001\) ). This apparent contradiction reflects a masking phenomenon where a positive correlation between size and collaboration masks a negative relationship between popularity and collaboration. At equivalent network sizes, popular topics actually have lower collaboration rates, a relationship only visible after disentangling the two effects.
| Simple Model | Size Control Model | |||
| Metric | \( \beta\) | \( \beta\) | Classification | R² Improv. |
| Robust Popularity Effects | ||||
| Modularity | \( 1.26^{***}\) | \( 0.55^{***}\) | Robust | 2.4% |
| Coreness Ratio | \( -1.60^{***}\) | \( -0.73^{***}\) | Robust | 3.2 % |
| Avg. Constraint | \( -0.96^{***}\) | \( 0.46^{**}\) | Robust (Reversed) | 9.1% |
| Size-Confounded Effects | ||||
| Degree Centralization | \( -0.88^{***}\) | \( -0.42\) | Confounded | 1.1% |
| Small World Coeff. | \( 1.75^{***}\) | \( -0.41^{*}\) | Confounded | 10.1% |
| Robustness Ratio | \( -1.62^{***}\) | \( -0.05\) | Confounded | 9.9% |
| Degree Assortativity | \( -0.59^{***}\) | \( -0.003\) | Confounded | 1.7% |
| Avg. Effective Size | \( 1.25^{***}\) | \( -0.28\) | Confounded | 9.4% |
| Repeated Collab. Rate | \( 0.40^{***}\) | \( -0.23\) | Confounded | 2.1% |
| Emergent Effect | ||||
| Collaboration Rate | \( 0.05\) | \( -2.33^{***}\) | Emergent | 27.9% |
The explanatory power gained from including network size varied substantially, with model \( R^2\) improvements ranging from a modest 1.1
Our analysis of 121,391 mathematical papers shows that research topics have fundamentally different collaboration environments, and these differences persist even after controlling for network size. This provides an empirical basis for understanding how topic selection shapes the context of a research career.
The distinction between the organization of popular and niche topics reflects different modes of knowledge production. Popular topics, which have significantly higher modularity (87.9 Conversely, niche topics maintain a pronounced core-periphery structure (7.0 Furthermore, the emergent property whereby popular topics show lower collaboration rates at equivalent network sizes (\( \beta = -2.33\) , \( p < 0.001\) ) adds another dimension to these considerations. This suggests that beyond topological differences, popular and niche fields may also cultivate different collaboration norms. Popular fields, with their established research programs and larger communities, might place a greater emphasis on individual contributions, whereas the tighter-knit, exploratory nature of niche fields may necessitate more interdependent teamwork.
Our analysis of structural constraint reveals a complex relationship with popularity. While a simple bivariate comparison shows researchers in popular topics face less constraint, this relationship reverses after controlling for network size; they instead face significantly higher structural constraints (\( \beta\) = 0.46, \( p < 0.004\) ). This result extends Burt’s theory of structural holes by incorporating a temporal, career-stage dimension [29].
The modular organization of popular fields creates numerous structural holes between communities, a configuration Burt identifies as advantageous for innovation. However, our data shows that the researchers occupying these brokerage positions are predominantly those with higher publication counts and seniority. This suggests a “brokerage ladder”: a career progression from a constrained position within a module toward a boundary-spanning role.
This progression can be understood as a network-based mechanism for the “Matthew Effect” [33], a dynamic formalized in network science as “preferential attachment” [13]. This principle posits that new collaborations are more likely to form with already-prominent researchers. Thus, when a valuable brokerage opportunity arises, it is the senior, high-status researchers who are most likely to attract that connection. The social capital required to span these boundaries is therefore a resource that accumulates over a career.
As [30] note, network size is an intrinsic part of the constraint measure. By statistically controlling for size, our analysis isolates the effect of structural position relative to a researcher’s peers. The high average constraint in popular fields is therefore not paradoxical; it reflects the typical early-career experience. A researcher must first embed within a specific community to build reputation, with brokerage opportunities emerging only as their career matures.
These structural differences carry practical implications for researchers at various career stages, though we emphasize that our analysis characterizes collaboration environments rather than predicting career outcomes. The Math Research Compass operationalizes these insights by providing transparent access to collaboration structures across 1,938 identified topics.
For early-career mathematicians, our findings suggest that topic selection involves an implicit choice between contrasting collaborative environments. Those drawn to popular areas should anticipate working within modular communities where establishing position within a specific research cluster becomes paramount. Success in such environments likely requires different strategies than in niche fields, where direct access to central experts provides clearer mentorship pathways but potentially fewer alternative routes should initial connections prove unproductive.
The emergent property whereby popular topics show lower collaboration rates at equivalent network sizes (\( \beta = -2.33\) , \( p < 0.001\) ) adds another dimension to these considerations. This suggests that beyond structural differences, popular and niche fields may cultivate different collaboration norms, with popular fields potentially emphasizing individual contributions within established research programs.
A key methodological contribution of this work is our two-stage analytical approach, which rigorously separates universal scaling effects from genuine popularity-driven organizational patterns. By first comparing groups and then using regression to control for network size, we demonstrate how univariate analyses can be misleading. The dramatic attenuation or reversal of several effects after size control (Table 4) underscores the necessity of this method for future studies of scientific collaboration.
While our findings are robust within this framework, several limitations define the scope of our conclusions. Our reliance on arXiv data, while comprehensive for many theoretical fields, may incompletely represent mathematical collaboration in applied areas where conference proceedings or industry partnerships are more common. Furthermore, our cross-sectional design establishes strong associations but precludes direct causal inference; future longitudinal work is needed to track how network structures evolve as topics gain or lose popularity. Finally, this study characterizes collaborative environments but does not link them to career outcomes like academic placement or grant success, which remains a valuable avenue for subsequent research.
We proactively addressed two other potential methodological concerns. First, while our author name disambiguation (AND) pipeline achieves a high validated precision of 86.0 Second, to ensure our findings were not an artifact of the COVID-19 pandemic, we performed a full temporal sensitivity analysis, splitting our dataset into “peak pandemic” (2020–2021) and “post-peak” (2022–2025) periods. As detailed in Appendix C (Table 8), the modular-hierarchical duality remained stable and highly significant across both eras. This provides strong evidence that the observed structures are fundamental features of mathematical collaboration, not transient phenomena. Interestingly, the only metrics that showed instability (degree centralization and average constraint) were not significant during the pandemic, suggesting that some hierarchical features may have been temporarily flattened by the global shift to remote work.
Future work should build on this foundation by applying our validated pipeline to longitudinal data and linking structural patterns to concrete career outcomes. Examining how researchers successfully transition between communities within modular fields could also provide actionable insights for those seeking to expand their collaborative reach.
Our comprehensive analysis reveals that mathematical collaboration networks exhibit predictable structural variations based on topic popularity, with these differences persisting after controlling for network size. The duality between universal scaling laws and field-specific organizational patterns suggests that while some network properties emerge mechanically with growth, others reflect genuine adaptations to different research contexts.
By documenting these patterns systematically and making them accessible through the Math Research Compass, we aim to democratize knowledge about collaboration structures that previously remained tacit within mathematical communities. While we stop short of prescriptive career advice, we believe that transparency about these structural realities enables more informed decision-making as researchers navigate the mathematical landscape. Understanding whether one is entering a modular, school-based field or a hierarchical, expert-centered domain represents valuable context for calibrating expectations and developing appropriate collaborative strategies.
All metrics were computed using Python’s NetworkX library (v3.4.2) [34]. Key implementation details are noted below.
This metric was calculated as the ratio of unique author pairs with more than one joint publication to the total number of unique collaborating pairs:
To ensure a robust calculation for potentially disconnected networks, our custom implementation first identified the LCC of each topic network. The actual clustering coefficient \( (C)\) and average shortest path length (\( L\) ) were computed on this LCC. The expected values for an equivalent Erdős-Rényi random graph (\( C_{\text{rand}}\) and \( L_{\text{rand}}\) ) were then calculated based on the number of nodes and edges of the LCC, providing a consistent null model. For networks with over 200 nodes, path lengths were estimated using random sampling to ensure computational feasibility.
This metric was calculated by simulating and comparing two node removal scenarios. We measured the size of the LCC after removing the top 10
Networks with fewer than 3 nodes typically returned default values (e.g., 0 or NaN depending on the metric) or were handled to ensure computational stability; such cases were minimal.
To ensure the construction of high-precision collaboration networks, we developed and implemented a conservative, multi-stage author name disambiguation (AND) pipeline named AuthorDisambiguator. The pipeline is designed to accurately merge common name variations (e.g., ‘J. Doe’ and ‘John Doe’) while aggressively preventing the incorrect merging of distinct authors with similar names (e.g., ‘Pengcheng Xie’ and ‘Pengxu Xie’), a common failure mode in less conservative systems. The process is organized into three main stages, preceded by a normalization step.
For each of the 121,391 papers, author lists were parsed from the raw ArXiv metadata format. All unique author strings were then subjected to a rigorous normalization procedure to create a consistent representation for comparison:
To efficiently identify potential merge candidates, we first constructed an undirected similarity graph where nodes represent unique normalized author strings.
The connected components of the similarity graph, representing clusters of potentially identical authors, were then analyzed using a set of conservative, rule-based heuristics to decide whether a merge was safe.
A cluster was only merged into a single canonical profile if all its first-name variants were compatible under this logic. For ambiguous clusters containing multiple distinct and incompatible first names, the algorithm attempted to partition the cluster into smaller, internally consistent subgroups for merging.
The final stage used co-authorship and publication patterns to resolve remaining ambiguities among canonical profiles that were not merged by string similarity alone.
The combined similarity was calculated as:
(1)
After all merging stages, the resulting author-to-canonical-name mapping was resolved transitively to ensure that all names in a merge chain pointed to a single, final canonical profile. The representative name for each merged cluster was chosen as the most complete name variant available, prioritizing names with more components and fewer initials. This process reduced the 120,767 unique raw author strings to 117,883 canonical author profiles, with a manually validated precision of 86.0
| Standardized Coefficients (\( \beta\) ) from Interaction Model | ||||
| Metric | Popularity | log(Authors) | Interaction Term | Adj. R² (Full Model) |
| Modularity | \( 0.208\) | \( 0.811^{***}\) | \( 0.655\) | 0.407 |
| Coreness Ratio | \( -0.507^{**}\) | \( -0.700^{***}\) | \( -0.354^{**}\) | 0.562 |
| Avg. Constraint | \( 0.641^{***}\) | \( -0.865^{***}\) | \( -0.395\) | 0.306 |
| Collaboration Rate | \( -3.654^{***}\) | \( 3.116^{***}\) | \( 1.383^{***}\) | 0.614 |
| Degree Centralization | \( -0.656^{***}\) | \( 0.215\) | \( 0.435\) | 0.215 |
| Small World Coeff. | \( -0.218\) | \( 0.387^{**}\) | \( 0.347\) | 0.539 |
| Robustness Ratio | \( -0.087\) | \( -0.560^{***}\) | \( 0.063\) | 0.617 |
| Degree Assortativity | \( 0.074\) | \( -0.356^{*}\) | \( -0.280\) | 0.105 |
| Avg. Effective Size | \( -0.270\) | \( 0.593^{***}\) | \( 0.021\) | 0.412 |
| Repeated Collab. Rate | \( -0.545^{**}\) | \( 0.764^{***}\) | \( 0.579\) | 0.081 |
| Effect Size (Cliff’s \( \delta\) ) at different min_topic_size settings | ||||
| Network Metric | {10} | {15 (main)} | {20} | {25} |
| Metrics where Popular > Niche | ||||
| Modularity | 0.905 | 0.785 | 0.794 | 0.846 |
| [0.88, 0.93] | [0.74, 0.83] | [0.73, 0.85] | [0.80, 0.89] | |
| Avg. Effective Size | 0.818 | 0.705 | 0.727 | 0.693 |
| [0.78, 0.85] | [0.65, 0.76] | [0.66, 0.79] | [0.63, 0.76] | |
| Repeated Collab. Rate | 0.590 | 0.284 | 0.308 | 0.345 |
| [0.52, 0.65] | [0.20, 0.37] | [0.20, 0.42] | [0.25, 0.44] | |
| Metrics where Niche > Popular | ||||
| Coreness Ratio | -0.678 | -0.929 | -0.928 | -0.902 |
| [-0.74, -0.61] | [-0.95, -0.90] | [-0.96, -0.89] | [-0.94, -0.86] | |
| Robustness Ratio | -0.894 | -0.840 | -0.906 | -0.866 |
| [-0.92, -0.86] | [-0.88, -0.80] | [-0.94, -0.87] | [-0.91, -0.82] | |
| Avg. Constraint | -0.361 | -0.565 | -0.557 | -0.512 |
| [-0.43, -0.29] | [-0.63, -0.50] | [-0.64, -0.47] | [-0.59, -0.43] | |
| Degree Assortativity | -0.310 | -0.326 | -0.348 | -0.278 |
| [-0.40, -0.22] | [-0.40, -0.24] | [-0.45, -0.24] | [-0.38, -0.18] | |
| Degree Centralization | -0.075 | -0.558 | -0.658 | -0.594 |
| [-0.15, 0.00] | [-0.62, -0.49] | [-0.73, -0.58] | [-0.67, -0.51] | |
| Metric with No Significant Difference | ||||
| Collaboration Rate | -0.055 | -0.027 | 0.135 | 0.028 |
| [-0.13, 0.02] | [-0.11, 0.05] | [-0.03, 0.24] | [-0.07, 0.13] | |
| Binary Model | Best Continuous Model | ||||||
| Network Metric | \( \boldsymbol{\beta}\) | Adj. R² | Type | \( \beta\) | Adj. R² | Cont. Better? | Effect Consist. |
| Collaboration Rate | \( \rm -3.915^{***}\) | 0.335 | Log Quadratic | \( \rm -4.886^{***}\) | 0.974 | Yes | Medium (62% ) |
| Repeated Collab. Rate | -0.200 | 0.068 | Log Quadratic | \( \rm -0.631^{***}\) | 0.076 | Yes | Medium (62% ) |
| Degree Centralization | \( \rm -1.182^{**}\) | 0.313 | Log Quadratic | \( \rm -0.809^{***}\) | 0.224 | No | High (85%) |
| Degree Assortativity | -0.028 | 0.146 | Log Quadratic | 0.300 | 0.110 | No | Medium (46% ) |
| Modularity | \( \rm {1}.102^{**}\) | 0.496 | Log Quadratic | 0.061 | 0.399 | No | High (85% ) |
| Small World Coeff. | \( \rm -3.403^{***}\) | 0.675 | Z-score + Size | \( \rm {0}.908^{***}\) | 0.642 | No | Medium (69% ) |
| Coreness Ratio | \( \rm -1.020^{***}\) | 0.636 | Log Quadratic | -0.196 | 0.554 | No | High (85% ) |
| Robustness Ratio | -0.345 | 0.765 | Robust + Size | 0.055 | 0.604 | No | Medium (46% ) |
| Avg. Constraint | \( \rm {0}.932^{**}\) | 0.394 | Log Quadratic | \( \rm {1}.000^{***}\) | 0.331 | No | High (100% ) |
| Avg. Effective Size | \( \rm -0.743^{*}\) | 0.538 | Log Quadratic | \( \rm -0.431^{**}\) | 0.406 | No | High (85% ) |
| 2020–2021 | 2022–2025 | ||
| Network Metric | Cliff’s \( \boldsymbol{\delta}\) | Cliff’s \( \boldsymbol{\delta}\) | Consistent?a |
| Consistent Effects (Popular > Niche) | |||
| Modularity | 0.959*** | 0.875*** | Yes |
| Avg. Effective Size | 0.751*** | 0.706*** | Yes |
| Small World Coefficient | 0.794*** | 0.857*** | Yes |
| Consistent Effects (Niche > Popular) | |||
| Coreness Ratio | -0.585*** | -0.929*** | Yes |
| Robustness Ratio | -0.715*** | -0.800*** | Yes |
| Degree Assortativity | -0.431*** | -0.419*** | Yes |
| Inconsistent or Non-Significant Effects | |||
| Degree Centralization | 0.005 | -0.542*** | Nob |
| Avg. Constraint | -0.127 | -0.467*** | Noc |
| Topic | Size | Quality | Coherence | Artifact? | Mathematical Area |
| Keywords (truncated) | (papers) | (1-5) | (1-5) | ||
| Small Topics (15-25 papers) | |||||
| Fermion systems, causal... | 15 | 4 | 4 | No | Mathematical Physics |
| Cherednik algebras, rational... | 16 | 5 | 5 | No | Representation Theory |
| Parking functions, parking... | 17 | 4 | 4 | No | Combinatorics |
| Vortex filaments, filament... | 18 | 4 | 4 | No | Fluid Dynamics |
| Medium Topics (50-100 papers) | |||||
| Ricci curvature, Ricci flat... | 54 | 5 | 5 | No | Differential Geometry |
| Iwasawa theory, Iwasawa main... | 58 | 5 | 5 | No | Number Theory |
| Weil-Petersson metric... | 68 | 5 | 5 | No | Teichmüller Theory |
| Siegel modular forms... | 88 | 5 | 5 | No | Number Theory |
| Large Topics (150+ papers) | |||||
| Hopf algebras, Hopf algebra... | 171 | 5 | 5 | No | Quantum Groups |
| Mean curvature flow... | 155 | 5 | 5 | No | Geometric Analysis |
| Bergman spaces, Bergman... | 206 | 5 | 5 | No | Complex Analysis |
| Very Large Topics (400+ papers) | |||||
| Elliptic curves, elliptic... | 659 | 5 | 5 | No | Algebraic Geometry |
| Knot invariants, knot... | 698 | 5 | 5 | No | Knot Theory |
[1] Colin B Clement, Matthew Bierbaum, Kevin P O'Keeffe, and Alexander A Alemi. On the use of arxiv as a dataset. arXiv preprint arXiv:1905.00075, 2019. 10.48550/arXiv.1905.00075.
[2] Andreas Rehs. A supervised machine learning approach to author disambiguation in the web of science. Journal of Informetrics, 15 (3): 101166, 2021. ISSN 1751-1577. 10.1016/j.joi.2021.101166.
[3] Yibo Chen, Zhiyi Jiang, Jianliang Gao, Hongliang Du, Liping Gao, and Zhao Li. A supervised and distributed framework for cold-start author disambiguation in large-scale publications. Neural Computing and Applications, 35: 13093–13108, 2023. 10.1007/s00521-020-05684-y.
[4] Jinseok Kim. Scale‐free collaboration networks: An author name disambiguation perspective. Journal of the Association for Information Science and Technology, 70 (7): 685–700, January 2019. ISSN 2330-1643. 10.1002/asi.24158.
[5] Andreas Strotmann, Dangzhi Zhao, and Tania Bubela. Author name disambiguation for collaboration network analysis and visualization. Proceedings of the American Society for Information Science and Technology, 46 (1): 1–20, 2009. 10.1002/meet.2009.1450460218.
[6] Mark EJ Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98 (2): 404–409, 2001. 10.1073/pnas.021544898.
[7] Mark EJ Newman. Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences, 101 (suppl_1): 5200–5205, 2004. 10.1073/pnas.0307545100.
[8] Mark EJ Newman. Power laws, pareto distributions and zipf's law. Contemporary Physics, 46 (5): 323–351, 2005. 10.1080/00107510500052444.
[9] Maria Cristiana Martini, Elvira Pelle, Francesco Poggi, and Andrea Sciandra. The role of citation networks to explain academic promotions: an empirical analysis of the italian national scientific qualification. Scientometrics, 128: 4235–4263, 2021. 10.1007/s11192-022-04485-5.
[10] Norman Cliff. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114 (3): 494–509, 1993. 10.1037/0033-2909.114.3.494.
[11] J. Romano, J. D. Kromrey, J. Coraggio, and J. Skowronek. Appropriate statistics for ordinal-level data: Should we really be using t-tests and anovas on ranks? The Journal of Experimental Education, 74 (4): 347–369, 2006.
[12] Stefan Wuchty, Benjamin Jones, and Brian Uzzi. The increasing dominance of teams in production of knowledge. Science (New York, N.Y.), 316: 1036–9, 06 2007. 10.1126/science.1136099.
[13] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286 (5439): 509–512, 1999. 10.1126/science.286.5439.509.
[14] Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27 (Volume 27, 2001): 415–444, 2001. ISSN 1545-2115. https://doi.org/10.1146/annurev.soc.27.1.415.
[15] Kazi Zainab Khanam, Gautam Srivastava, and Vijay Mago. The homophily principle in social network analysis: A survey. Multimedia Tools Appl., 82 (6): 8811–8854, 2022. ISSN 1380-7501. 10.1007/s11042-021-11857-1.
[16] Matthew O Jackson and Asher Wolinsky. A strategic model of social and economic networks. Journal of economic theory, 71 (1): 44–74, 1996. 10.1006/jeth.1996.0108.
[17] Charles F. Manski. Identification of endogenous social effects: The reflection problem. The Review of Economic Studies, 60 (3): 531–542, 1993. ISSN 0034-6527. 10.2307/2298123.
[18] Cosma Rohilla Shalizi and Andrew C. Thomas. Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40 (2): 211–239, 2011. ISSN 0049-1241. 10.1177/0049124111404820.
[19] Mario V Tomasello, Mauro Napoletano, Antonios Garas, and Frank Schweitzer. The role of endogenous and exogenous mechanisms in the formation of r & d networks. Scientific reports, 4 (1): 5679, 2014. 10.1038/srep05679.
[20] J. C. Brunson, S. Fassino, A. McInnes, M. Narayan, B. Richardson, C. Franck, and R. Laubenbacher. Evolutionary events in a mathematical sciences research collaboration network. Scientometrics, 99 (3): 973–998, 2014. 10.1007/s11192-013-1209-z.
[21] Maarten Grootendorst. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint, 2022. 10.48550/arXiv.2203.05794.
[22] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arxiv preprint, 2019. 10.48550/arXiv.1810.04805.
[23] Leland McInnes, John Healy, Nathaniel Saul, and Lukas Großberger. Umap: Uniform manifold approximation and projection. Journal of Open Source Software, 3 (29): 861, 2018. 10.21105/joss.00861.
[24] Leland McInnes, John Healy, and Steve Astels. hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2 (11): 205, 2017. 10.21105/joss.00205.
[25] Karthik Rajkumar, Guillaume Saint-Jacques, Iavor Bojinov, Erik Brynjolfsson, and Sinan Aral. A causal test of the strength of weak ties. Science, 377 (6612): 1304–1310, 2022. 10.1126/science.abl4476.
[26] Mark EJ Newman. Assortative mixing in networks. Phys. Rev. Lett., 89: 208701, 2002. 10.1103/PhysRevLett.89.208701.
[27] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393 (6684): 440–442, 1998. 10.1038/30918.
[28] Ronald S. Burt. Structural Holes. Harvard University Press, Cambridge, MA, 1992.
[29] Ronald S Burt. Structural holes and good ideas. American journal of sociology, 110 (2): 349–399, 2004. 10.1086/421787.
[30] Martin G Everett and Stephen P Borgatti. Unpacking burt's constraint measure. Social Networks, 62: 65–73, 2020. 10.1016/j.socnet.2020.02.001.
[31] Wenlong Yang and Yang Wang. Higher-order structures of local collaboration networks are associated with individual scientific productivity. EPJ Data Science, 13 (1): 15, 2024. 10.1140/epjds/s13688-024-00453-6.
[32] Shuang Liao, Christopher Lavender, and Huiwen Zhai. Factors influencing the research impact in cancer research: a collaboration and knowledge network analysis. Health Research Policy and Systems, 22 (1): 96, 2024. 10.1186/s12961-024-01205-8.
[33] Robert K Merton. The matthew effect in science. Science, 159 (3810): 56–63, 1968. 10.1126/science.159.3810.56.
[34] Aric A. Hagberg, Pieter J. Swart, and Daniel S. Chult. Exploring network structure, dynamics, and function using NetworkX. In Gaël, Jarrod Varoquaux, and Travis Vaught, editors, Proceedings of the 7th Python in Science Conference (SciPy2008), pages 11–15, 2008. URL https://www.osti.gov/biblio/960616.
[35] Mark S Granovetter. The strength of weak ties. American Journal of Sociology, 78 (6): 1360–1380, 1973.
[36] Jane Payumo, Guangming He, Anusha Chintamani Manjunatha, Devin Higgins, and Scout Calvert. Mapping collaborations and partnerships in sdg research. Frontiers in Research Metrics and Analytics, Volume 5 - 2020, 2021. 10.3389/frma.2020.612442.
[37] Alexander Michael Petersen. Quantifying the impact of weak, strong, and super ties in scientific careers. Proceedings of the National Academy of Sciences, 112 (34): E4671–E4680, 2015. 10.1073/pnas.1501444112.
[38] Linton C. Freeman. Centrality in social networks conceptual clarification. Social Networks, 1 (3): 215–239, 1978. ISSN 0378-8733. 10.1016/0378-8733(78)90021-7.
[39] Rajat Khanna and Isin Guler. Degree assortativity in collaboration networks and invention performance. Strategic Management Journal, 43 (7): 1402–1430, 2022. https://doi.org/10.1002/smj.3367.
[40] Mark EJ Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103 (23): 8577–8582, 2006.
[41] Stephen P Borgatti and Martin G Everett. Models of core/periphery structures. Social Networks, 21 (4): 375–395, 2000. 10.1016/S0378-8733(99)00019-2.
[42] Xiao Zhang, Travis Martin, and M. E. J. Newman. Identification of core-periphery structure in networks. Phys. Rev. E, 91: 032803, 2015. 10.1103/PhysRevE.91.032803.