The relation between Pearson's correlation coefficient r and Salton's cosine measure (2009)
The relation between Pearson's correlation coefficient and Salton's cosine measure is revealed based on the different possible values of the division of the L1-norm and the L2-norm of a vector. These...
Mathematical study of h-index sequences (2009)
This paper studies mathematical properties of h-index sequences as developed by Liang Liming [h-index sequence and h-index matrix: constructions and applications. Scientometrics 69(1), 153-159,...
Characteristic scores and scales based on h-type indices (2009)
Based on the rank-order citation distribution of e.g. a researcher, one can define certain points on this distribution, hereby summarizing the citation performance of this researcher. Previous work...
Characteristic scores and scales in a Lotkaian framework (2009)
The characteristic scores and scales (CSS), introduced by Glänzel and Schubert [Journal of Information Science 14, 123-127, 1988] and further studied in subsequent papers of Glänzel, can be...
We present a rationale for the Hirsch-index rank-order distribution and prove that it is a power law (hence a straight line in the log-log scale). This is confirmed by experimental data of Pyykkö...
An econometric property of the g-index (2009)
Let X=(x(1), ... ,x(N)) and Y=(y(1), ... ,y(N)) be two decreasing vectors with positive coordinates Sigma(N)(j=1)x(j) = Sigma(N)(j=1)y(j) (representing e.g. citation data of articles of two authors...
Mathematical derivation of the impact factor distribution (2009)
Experimental data in Mansilla, Köppen, Cocho and Miramontes [Journal of Informetrics 1(2), 155-160, 2007] reveal that, if one ranks a set of journals (e.g. in a field) in decreasing order of their...
Abstract Duality in information retrieval and the hypergeometric distribution (2008)
Duality is an important topic in informetrics, especially in connection with the classical informetric laws. Yet, this concept is less studied in information retrieval. It deals with the unification...
ABSTRACT A measure for the cohesion of weighted networks (2008)
A generalization of both the Botafogo-Rivlin-Shneiderman compactness measure and the Wiener index is presented. These new measures for the cohesion of networks can be used in case a dissimilarity...
This is a preprint of an article accepted for publication in JASIST © 2006 TOP-curves (2008)
Leo Egghe, Ronald Rousseau, Sandra Rousseau
Several characteristics of classical Lorenz curves make them unsuitable for the study of a group of top-performers. TOP-curves, defined as a kind of mirror image of TIP-curves used in poverty...
ABSTRACT TOPOLOGICAL ASPECTS OF INFORMATION RETRIEVAL (2008)
Let (DS, QS, sim) be a retrieval system consisting of a document space DS, a query space QS, and a function sim, expressing the similarity between a document and a query. Following Everett and Cater,...
Power laws, such as Zipf s law, and exponential relations, leading to straight lines in logarithmic or semi-logarithmic scales, are presented in a unified setting. It is shown that the class of...
General results on transformations on information production processes (IPPs), involving transformations of the h-index and related indices, are applied in concrete, simple cases: doubling the...
EGGHE, Leo, Ravichandra Rao, I.K.
To appear 2008
The mathematical relation between the impact factor and the uncitedness factor (2008)
In a general framework, given a set of articles and their received citations (time periods of publication or citation are not important here) one can define the impact factor (IF) as the total number...
Modelling successive h-indeces (2008)
From a list of papers of an author, ranked in decreasing order of the number of citations to these papers one can calculate this author's Hirsch index (or h-index). If this is done for a group of...
Study of different h-indices for groups of authors (2008)
In this article, for any group of authors, we define three different h-indices. First, there is the successive h-index h2 based on the ranked list of authors and their h-indices h1 as defined by...
The article studies the influence of the query formulation of a topic on its h-index. In order to generate pure random sets of documents, we used N-grams (N variable) to measure this influence:...
General results on transformations on information production processes (IPPs), involving transformations of the h-index and related indices, are applied in concrete, simple cases: doubling the...
An h-index weighted by citation impact (2008)
An h-type index is proposed which depends on the obtained citations of articles belonging to the h-core. This weighted h-index, denoted as h., is presented in a continuous setting and in a discrete...
The influence of merging on h-type indices (2008)
Each information production process has a unique h-index. This paper studies the problem: what are possible h-index values if we merge two or more IPPs ?
A model for the size-frequency function of co-author pairs (2008)
Lotka’s law was formulated to describe the number of authors with a certain number of publications. Empirical results [S.A. Morris and M.L. Goldstein. JASIST 58(12), 1764-1782, 2007] indicate that...
Performance and its relation with productivity in Lotkaian systems (2008)
In general information production processes (IPPs), we define productivity as the total number of sources but we present a choice of seven possible definitions of performance: the mean or median...
The relation between Pearson’s correlation coefficient r and Salton’s cosine measure (2008)
The relation between Pearson’s correlation coefficient and Salton’s cosine measure is revealed based on the different possible values of the division of the -norm and the norm of a vector. These...
Mathematical derivation of the impact factor distribution (2008)
Experimental data in Mansilla, Köppen, Cocho and Miramontes [Journal of Informetrics 1(2), 155-160, 2007] reveal that, if one ranks a set of journals (e.g. in a field) in decreasing order of their...
The distribution of the uncitedness factor and its functional relation with the impact factor (2008)
The uncitedness factor of a journal is its fraction of uncited articles. Given a set of journals (e.g. in a field) we can determine the rank-order distribution of these uncitedness factors. Hereby we...
Time-dependent Lotkaian informetrics incorporating growth of sources and items (2008)
In a previous article, static Lotkaian theory was extended by introducing a growth function for the items. In this article, a second general growth function – this time for the sources – is...
The influence of transformations on the h-index and the g-index (2007)
In a previous paper we introduced a general transformation on sources and one on items in an arbitrary information production process (IPP). In this paper we investigate the influence of these...
An h-index weighted by citation impact (2007)
An h-type index is proposed which depends on the obtained citations of articles belonging to the h-core. This weighted h-index, denoted as hw, is presented in a continuous setting and in a discrete...
Evolution of information production processes (IPPs) can be described by a general transformation function for the sources and for the items. It generalises the Fellman–Jakobsson transformation...
A probabilistic model is presented to estimate the number of lost multi-copy documents, based on retrieved ones. For this we only need the number of retrieved documents of which we have one copy and...
The model for the cumulative nth citation distribution, as developed in [L. Egghe, I.K. Ravichandra Rao, Theory of first-citation distributions and applications, Mathematical and Computer Modelling...
The R- and AR-indices: Complementing the h-index (2007)
Jin, Bihui, Liang, Liming, Rousseau, Ronald, EGGHE, Leo
Based on the foundation laid by the h-index we introduce and study the R- and AR-indices. These new indices eliminate some of the disadvantages of the h-index, especially when they are used in...
Distributions of the h-index and the g-index (2007)
In every scientific research area, each scientist has a unique h-index and g-index. This paper addresses the problem of determining the distribution of these indexes over the scientists. We apply...
Uncertainty and information: Foundations of generalized information theory. (2007)
Hasselt Univ, B-3590 Diepenbeek, Belgium.EGGHE, L, Hasselt Univ, Campus Diepenbeek Agorolaan, B-3590 Diepenbeek, Belgium.leo.egghe@uhasselt.be
An h-index weighted by citation impact (2007)
An h-type index is proposed which depends on the obtained citations of articles belonging to the h-core. This weighted h-index, denoted as hw, is presented in a continuous setting and in a discrete...
Distributions of the h-index and the g-index (2007)
In every scientific research area, each scientist has a unique h-index and g-index. This paper addresses the problem of determining the distribution of these indexes over the scientists. We apply...
A probabilistic model is presented to estimate the number of lost multi-copy documents, based on retrieved ones. For this we only need the number of retrieved documents of which we have one copy and...
Evolution of information production processes (IPPs) can be described by a general transformation function for the sources and for the items. It generalises the Fellman–Jakobsson transformation...
The model for the cumulative nth citation distribution, as developed in [L. Egghe, I.K. Ravichandra Rao, Theory of first-citation distributions and applications, Mathematical and Computer Modelling...
The R- and AR-indices: Complementing the h-index (2007)
Jin, Bihui, Liang, Liming, ROUSSEAU, Ronald, EGGHE, Leo
Based on the foundation laid by the h-index we introduce and study the R- and AR-indices. These new indices eliminate some of the disadvantages of the h-index, especially when they are used in...
Uncertainty and information: Foundations of generalized information theory. (2007)
Hasselt Univ, B-3590 Diepenbeek, Belgium.EGGHE, L, Hasselt Univ, Campus Diepenbeek Agorolaan, B-3590 Diepenbeek, Belgium.leo.egghe@uhasselt.be
EGGHE, Leo, ROUSSEAU, Ronald, Rousseau, Sandra
Several characteristics of classical Lorenz curves make them unsuitable for the study of a group of top-performers. TOP-curves, defined as a kind of mirror image of TIP-curves used in poverty...
A note on measuring overlap (2006)
In measuring the overlap between two sets A and B (e.g. libraries, databases, …) one is obliged to calculate the overlap .....of A with respect to B (i.e. the fraction of elements of B that are...
Dynamic h-index: the Hirsch index in function of time (2006)
When we have a group of papers and when we fix the present time we can determine the unique number h being the number of papers that received h or more citations while the other papers received a...
Empirical and combinatorial study of country occurrences in multi-authored papers (2006)
Papers written by several authors can be classified according to the countries of the author affiliations. The empirical part of this paper consists of two datasets. One dataset consists of 1,035...
Theory and practise of the g-index (2006)
The g-index is introduced as an improvement of the h-index of Hirsch to measure the global citation performance of a set of articles. If this set is ranked in decreasing order of the number of...
Systems without low-productive sources (2006)
Information production processes (IPPs) without low-productive sources are studied. A success-breeds-success or preferential attachment mechanism is established in which, from some point in time on,...
EGGHE, Leo, RAO, Ravichandra, Sahoo, Bibhuti Bhusan
In a recent paper [H. F. MOED, E. GARFIELD: In basic science the percentage of “authoritative”references decreases as bibliographies become shorter. Scientometrics 60 (3) (2004) 295-303] the...
An informetric model for the Hirsch-index (2006)
The h-index (or Hirsch-index) was defined by Hirsch in 2005 as the number h such that, for a general group of papers, h papers received at least h citations while the other papers received no more...
Empirical and combinatorial study of country occurrences in multi-authored papers (2006)
Papers written by several authors can be classified according to the countries of the author affiliations. The empirical part of this paper consists of two datasets. One dataset consists of 1,035...
Theory and practise of the g-index (2006)
The g-index is introduced as an improvement of the h-index of Hirsch to measure the global citation performance of a set of articles. If this set is ranked in decreasing order of the number of...
Systems without low-productive sources (2006)
Information production processes (IPPs) without low-productive sources are studied. A success-breeds-success or preferential attachment mechanism is established in which, from some point in time on,...
Classical information retrieval and overlap measures such as the Jaccard index, the Dice coefficient and Salton's cosine measure can be characterized by Lorenz curves. This result demonstrates the...
EGGHE, Leo, RAO, Ravichandra, Sahoo, Bibhuti Bhusan
In a recent paper [H. F. MOED, E. GARFIELD: In basic science the percentage of “authoritative”references decreases as bibliographies become shorter. Scientometrics 60 (3) (2004) 295-303] the...
Untangling Herdan’s law and Heaps’ law: mathematical and informetric arguments (2005)
Herdan’s law in linguistics and Heaps’ law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistical terms they state that vocabularies’...
In this paper, for the first time, we present global curves for the measures precision, recall, fallout and miss in function of the number of retrieved documents. Different curves apply for different...
Properties of the n-overlap vector and n-overlap similarity theory (2005)
In the first part of this paper we define the n-overlap vector whose coordinates consist of the fraction of the objects (e.g. books, N-grams,…) that belong to 1, 2,…, n sets (more generally:...
The dependence of the height of a Lorenz curve of a Zipf function on the size of the system (2005)
The Lorenz curve of a Zipf function describes, graphically, the relation between the fraction of the items and the fraction of the sources producing these items. Hence it generalizes the so-called...
This paper extends the Lorenz theory, developed in [L. Egghe and R. Rousseau. Symmetric and asymmetric theory of relative concentration and applications. Scientometrics 52(2), 261-290, 2001], so that...
Properties of the n-overlap vector and n-overlap similarity theory (2005)
In the first part of this paper we define the n-overlap vector whose coordinates consist of the fraction of the objects (e.g. books, N-grams,…) that belong to 1, 2,…, n sets (more generally:...
The dependence of the height of a Lorenz curve of a Zipf function on the size of the system (2005)
The Lorenz curve of a Zipf function describes, graphically, the relation between the fraction of the items and the fraction of the sources producing these items. Hence it generalizes the so-called...
This paper extends the Lorenz theory, developed in [L. Egghe and R. Rousseau. Symmetric and asymmetric theory of relative concentration and applications. Scientometrics 52(2), 261-290, 2001], so that...
EGGHE, Leo, Ravichandra Rao, Inna Kedage, Sahoo, Bibhuti Bhusan
In a recent paper [H.F. Moed and E. Garfield: In basic science the percentage of “authoritative” references decreases as bibliographies become shorter. Scientometrics 60(3), 295-303, 2004] the...
This article calculates probabilities for the occurrence of different types of papers such as genius papers, basic papers, ordinary papers or insignificant papers. The basis of these calculations are...
On the relation between the Maximum Entropy Principle and the Principle of Least Effort (2005)
The Maximum Entropy Principle (MEP) maximizes the entropy subject to the constraint that the effort remains constant. The Principle of Least Effort (PLE) minimizes the effort subject to the...
Power laws as defined in 1926 by A. Lotka are increasing in importance because they have been found valid in varied social networks including the Internet. In this article some unique properties of...
Relations between the continuous and the discrete Lotka power function (2005)
The discrete Lotka power function describes the number of sources (e.g., authors) with n=1, 2, 3, . . items (e.g., publications). As in econometrics, informetrics theory requires functions of a...
A characterization of the law of Lotka in terms of sampling (2005)
In order to model the variable T (the age of citations received by scientific works) with data elaborated by the Institute of Scientific Information, we have used some of the instruments already...
Zipfian and Lotkaian continuous concentration theory (2005)
This paper studies concentration (i.e. inequality) aspects of the functions of Zipf and of Lotka. Since both functions are power laws (i.e. they are – mathematically the same) it suffices to...
N-grams are generalized words consisting of N consecutive symbols (letters), as they are used in a text. N-word phrases are general concepts consisting of N consecutive words, also as used in a text....
The share of items of highly productive sources as a function of the size of the system (2005)
The research in this paper is based on the paper “D.W. Aksnes and G. Sivertsen. The effect of highly cited papers on national citation indicators. Scientometrics 59(2), 213-224, 2004” where one...
If we fix a citing period and a cited period, the Rowlands Journal Diffusion factor (RJDF) is the number of different citing journals divided by the total number of citations. The Frandsen Journal...
Existence theorem of the quadruple (P, R, F, M): Precision, Recall, Fallout and Miss (2005)
In an earlier paper [L. Egghe. A universal method of information retrieval evaluation: the “missing “ link M and the universal IR surface, Information Processing and Management 40, 21-30, 2004]...
This paper introduces weighted Lorenz curves of a continuous variable, extending the discrete theory as well as the non-weighted continuous model. Using publication scores (in function of time) as...
We study new and existing data sets which show that growth rates of sources usually are different from growth rates of items. Examples: references in publications grow with a rate that is different...
Comparing partial and truncated conglomerates from a concentration theoretic point of view (2005)
When studying numerical properties of a population (technically: a conglomerate) it often happens that not all data are known. It might be that the total number of objects (persons) in the population...
Dynamics of a field list of internationally visible journals : a stochastic model (2005)
A basic model for the dynamics of a field list of internationally visible journals is constructed. We study, in function of time, the remainder of an initially choosen set of source journals. The...
Classical information retrieval and overlap measures such as the Jaccard index, the Dice coefficient and Salton's cosine measure can be characterized by Lorenz curves. This result demonstrates the...
On the relation between the Maximum Entropy Principle and the Principle of Least Effort (2005)
The Maximum Entropy Principle (MEP) maximizes the entropy subject to the constraint that the effort remains constant. The Principle of Least Effort (PLE) minimizes the effort subject to the...
Quantitative aspects of the management of the modern (scientific) library (2004)
This paper and talk examines aspects of data collection for the management of a modern (scientific) library. We discuss: reports as a public relations and public awareness tool, norms and standards,...
A local hierarchy theory for acyclic diagraphs (2004)
Local hierarchy theory focuses on direct links in acyclic digraphs. In- and out-degrees are used to determine the local hierarchical number for each vertex in the graph. Together, these local...
Positive reinforcement and 3-dimensional informetrics (2004)
We show that the composition of two information production processes (IPPs), where the items of the first IPP are the sources of the second, and where the ranks of the sources in the first IPP agree...
Solution of a problem of Buckland on the influence of obsolescence on scattering (2004)
In an old paper [M.K. Buckland. Are obsolescence and scattering related? Journal of Documentation 28 (3) (1972) 242-246] Buckland poses the question if certain types of obsolescence of scientific...
The source-item coverage of the Lotka function (2004)
The following problem has never been studied : Given A, the total number of items (e.g. articles) and T, the total number of sources (e.g. journals that contain these articles) (hence A>T), when is...
Quantitative aspects of the management of the modern (scientific) library (2004)
This paper and talk examines aspects of data collection for the management of a modern (scientific) library. We discuss: reports as a public relations and public awareness tool, norms and standards,...
Similarity between objects (documents, persons, answers to a questionnaire, etc.) is generally determined through relations between representations of these objects. In the case of binary...
A local hierarchy theory for acyclic digraphs (2004)
Local hierarchy theory focuses on direct links in acyclic digraphs. In- and out-degrees are used to determine the local hierarchical number for each vertex in the graph. Together, these local...
Vector retrieval, fuzzy retrieval and the universal fuzzy IR surface for IR evaluation (2004)
It is shown that vector information retrieval (IR) and general fuzzy IR uses two types of fuzzy set operations: the original "Zadeh min–max operations" and the so-called "probabilistic sum and...
The paper shows that the present evaluation methods in information retrieval (basically recall R and precision P and in some cases fallout F) lack universal comparability in the sense that their...
How to measure own-group preference? A novel approach to a sociometric problem (2004)
In this article we present a precise definition of the notion "own-group preference" and characterize all functions capable of correctly measuring it. Examples of such functions are provided. The...
A local hierarchy theory for acyclic diagraphs (2004)
Local hierarchy theory focuses on direct links in acyclic digraphs. In- and out-degrees are used to determine the local hierarchical number for each vertex in the graph. Together, these local...
Type/Token-Taken informetrics (2003)
Type/Token-Taken informetrics is a new part of informetrics that studies the use of items rather than the items itself. Here, items are the objects that are produced by the sources (e.g., journals...
Type/Token-Taken informetrics (2003)
Type/Token-Taken informetrics is a new part of informetrics that studies the use of items rather than the items itself. Here, items are the objects that are produced by the sources (e.g., journals...
Power laws, such as Zipf s law, and exponential relations, leading to straight lines in logarithmic or semi-logarithmic scales, are presented in a unified setting. It is shown that the class of...
The Byline: Thoughts on the distribution of author ranks in multiauthored papers (2003)
EGGHE, Leo, Liming, Liang, ROUSSEAU, Ronald
We analyze the multi-authorship matrix M, defined as the matrix where a cell M(j,k) denotes the number of times authors with j publications are ranked as kth author of an article. We prove that if...
This article brings the underlying structure of different relative indicators to the forefront. Special attention is given to the relative impact of a journal within a set of journals, a so-called...
A measure for the cohesion of weighted networks (2003)
A generalization of both the Botafogo-Rivlin-Shneiderman compactness measure and the Wiener index is presented. These new measures for the cohesion of networks can be used in case a dissimilarity...
Compactness as introduced by Botafogo, Rivlin and Shneiderman, in short: BRS-compactness, is studied in general, as it can be used to describe the cohesion of parts of the internet or collaboration...
Ordered sets of documents are encountered more and more in information distribution systems, such as information retrieval systems. Classical similarity measures for ordinary sets of documents hence...
A distribution of papers based on the fractional counting: an empirical study. (2003)
Ravichandra Rao, Inna Kedage, Sahoo, Bibhuti Bhusan, EGGHE, Leo
Strong similarity measures for ordered sets of documents in information retrieval (2002)
A general method is presented to construct ordered similarity measures (OS-measures), i.e., similarity measures for ordered sets of documents (as, e.g., being the result of an IR-process), based on...
A general frame-work for relative impact indicators (2002)
This article brings the underlying structure of different relative indicators to the forefront. This leads to a powerful device for constructing new indicators. Special attention is given to the...
Mathematical foundations of information retrieval. (2002)
Univ Diepenbeek, B-3590 Diepenbeek, Belgium.Egghe, L, Univ Diepenbeek, Univ Campus, B-3590 Diepenbeek, Belgium.
Determining the core of a field's literature, i.e. its 'most important' sources, has been and still is an important problem in bibliometrics. In this article an exact definition of a core of a...
Fractional frequency distributions of, for example, authors with a certain (fractional) number of papers are very irregular and, therefore, not easy to model or to explain. This article gives a first...
Theory and experimentation on the most-recent-reference distribution (2002)
The cumulative distribution of the age of the most-recent-reference distribution is the ldquodualrdquo variant of the first-citation distribution. The latter has been modelled in previous...
In digraphs one has a hierarchy based on the unidirectional order between the vertices of the graph. We present a method of measuring degrees of hierarchy as expressed by the inequality that exists...
Co-citation, bibliographic coupling and a characterization of lattice citation networks (2002)
In this article we study directed, acyclic graphs. We introduce the head and tail order relations and study some of their properties. Recalling the notions of generalized bibliographic coupling and...
Sampling and concentration values of incomplete bibliographies (2002)
This paper studies concentration aspects of bibliographies. More in particular we study the impact of incompleteness of such a bibliography on its concentration values (i.e. its degree of inequality...
Lorenz curves were invented to model situations of inequality in real life and applied in econometrics (distribution of wealth or poverty), biometrics (distribution of species richness), and...
Strong similarity measures for ordered sets of documents in information retrieval (2002)
A general method is presented to construct ordered similarity measures (OS-measures), i.e., similarity measures for ordered sets of documents (as, e.g., being the result of an IR-process), based on...
A general frame-work for relative impact indicators (2002)
This article brings the underlying structure of different relative indicators to the forefront. This leads to a powerful device for constructing new indicators. Special attention is given to the...
Mathematical foundations of information retrieval. (2002)
Univ Diepenbeek, B-3590 Diepenbeek, Belgium.Egghe, L, Univ Diepenbeek, Univ Campus, B-3590 Diepenbeek, Belgium.
Comments on the "Letter to the Editor" by Burrell (2001)
LUC, B-3590 Diepenbeek, Belgium.Egghe, L, LUC, Univ Campus, B-3590 Diepenbeek, Belgium.
Comments on the "Letter to the Editor"' by Burrell (2001)
Limburgs Univ Ctr, B-3590 Diepenbeek, Belgium.Egghe, L, Limburgs Univ Ctr, Univ Campus, B-3590 Diepenbeek, Belgium.
A heuristic study of the first-citation distribution (vol 48, pg 345, 2000) (2001)
Limburgs Univ Ctr, B-3590 Diepenbeek, Belgium.Egghe, L, Limburgs Univ Ctr, Univ Campus, B-3590 Diepenbeek, Belgium.
Determining the core of a field's literature. i.e. its 'most important' sources, has been and still is an important problem in bibliomehics. In this article an exact definition of a core of a...
A noninformetric analysis of the relationship between citation age and journal productivity (2001)
A problem, raised by Wallace (JASIS, 37, 136-145, [1986]), on the relation between the journal's median citation age and its number of articles is studied. Leaving open the problem as such, we give a...
Theory of first-citation distributions and applications (2001)
EGGHE, Leo, Ravichandra Rao, Inna Kedage
The general relation between the first-citation distribution and the general citation-age-distribution is shown. It is shown that, if Lotka's exponent small alpha, Greek = 2, both distributions are...
Symmetric and Asymmetric Theory of Relative Concentration and Applications (2001)
Relative concentration theory studies the degree of inequality between two vectors (a1,...,aN) and (agr1,...,agrN). It extends concentration theory in the sense that, in the latter theory, one of the...
The detection of double errors in ISBN- and ISSN-like codes (2001)
Coding of the ISBN and ISSN is studied, and possible alternatives, not all equivalent to the official ones, are formulated. A minimum requirement for a useful code is that all single errors as well...
Comments on the "Letter to the Editor" by Burrell (2001)
LUC, B-3590 Diepenbeek, Belgium.Egghe, L, LUC, Univ Campus, B-3590 Diepenbeek, Belgium.
Comments on the "Letter to the Editor"' by Burrell (2001)
Limburgs Univ Ctr, B-3590 Diepenbeek, Belgium.Egghe, L, Limburgs Univ Ctr, Univ Campus, B-3590 Diepenbeek, Belgium.
A heuristic study of the first-citation distribution (vol 48, pg 345, 2000) (2001)
Limburgs Univ Ctr, B-3590 Diepenbeek, Belgium.Egghe, L, Limburgs Univ Ctr, Univ Campus, B-3590 Diepenbeek, Belgium.
Probabilistisch model van het spel BLOKKEN van TV1 en het vermoeden van Ben Crabbé (2000)
Dit artikel kwarn tot stand door de auteur's verwondering over de regelmatig herhaalde uitspraak van Ben Crabbé (presentator van BLOKKEN) dat "een speler, die voor de tweede keer het spel BLOKKEN...
New informetric aspects of the Internet: some reflections - many problems (2000)
This paper poses more problems than it solves: it investigates the new (virtual) world of the Internet and the challenges that it offers for informetric analysis. The paper studies five different...
A Heuristic Study of the First-Citation Distribution (2000)
The first-citation distribution, i.e. the cumulative distribution of the time period between publication of an article and the time it receives its first citation, has never bean modelled by using...
Aging, obsolescence, impact, growth and utilization: definitions and relations (2000)
The notions aging, obsolescence, impact, growth, utilization and their relations are studied. It is shown how to correct an observed citation distribution for growth, once the growth distribution is...
This paper establishes the general relation between the distribution of N-tuples of letters (e.g., N-truncations, N-grams) or words (e.g., N-word phrases) and the distributions of the single letters...
The Distribution of N-Grams (2000)
N-grams are generalized words consisting of N consecutive symbols, as they are used in a text. This paper determines the rank-frequency distribution for redundant N-grams. For entire texts this is...
Partial orders and measures for language preferences (2000)
Relative own-language preference depends on two parameters: the publication share of the language, and the self-citing rate, Openness of language L with respect to language J depends on three...
EGGHE, Leo, ROUSSEAU, Ronald, Van Hooydonk, G.
One aim of science evaluation studies is to determine quantitatively the contribution of different players (authors, departments, countries) to the whole system. This information is then used to...
A review of ranking problems in scientometrics and informetric (2000)
The paper highlights problems with the ranking of average scores of scientific groups. These can be considered in a wide sense - e.g. scientific disciplines formed by journals or authors and where...
Observed aging curves are influenced by publication delays. In this article, we show how the undisturbed aging function and the publication delay combine to give the observed aging function. This...
A model for measuring the congestion in library shelves. (1999)
A model for measuring the congestion in library shelves after j years (icN) is obtained by taking j-fold convolutions of the distributions that describe the yearly growth of literature (e.g....
The"own-language preference": measures of "relative language seff-citation" (1999)
EGGHE, Leo, ROUSSEAU, Ronald, Yitzhaki, M.
It has already been pointed out that the foreign language barrier is probably the greatest impediment to the free flow and transfer of information. This barrier is even growing as scientists of more...
Consider a country's national output, measured by counting the number of authors from country c that collaborate in every paper in a bibliography. Depending or not that country c appears at least...
Detection and correction of multiple errors in general block codes (1999)
The continuum modelling of cell migration during cancer invasion results in the coupling of parabolic and hyperbolic partial differential equations (PDEs) arising from the random motility of normal...
An application of martingales in the limit to a problem in information science (1999)
Martingales in the limit (mils) were introduced about two decades ago as nontrivial extensions of martingales. It was proved in 1976 that they have good convergence properties (at least) for...
On the law of Zipf-Mandelbrot for multi-word phrases (1999)
The paper studies the probabilities of the occurrence of m - word phrases (m=2,3, ...) in relation with the probabilities of occurrence of the single words. It is well-known that, in the latter case,...
On the law of Zipf-Mandelbrot for multi-world phrases (1999)
The paper studies the probabilities of the occurrence of m - word phrases (m=2,3, ...) in relation with the probabilities of occurrence of the single words. It is well-known that, in the latter case,...
Mathematical theories of citation (1998)
The paper focusses on possible mathematical theories of citation and on the intrinsic problems related to it. It sheds light on aspects of mathematical complexity as e.g. encountered in fractal...
Core collections, as e.g., the set of source journals selected by the Institute for Scientific Information, vary from year to year: most of them stay, some leave, and others enter. In this paper, we...
Topologies for retrieval systems are generated by certain subsets, called retrievals. In this article we show how recall and precision can be expressed using only retrievals. Different types of...
Properties of topologies of information retrieval systems (1998)
This paper studies topological properties of different topologies that are possible on the space of documents as they are induced by queries in a query space together with a similarity function...
Topological aspects of information retrieval (1998)
Let (DS, DQ, sim) be a retrieval system consisting of a document space DS, a query space QS, and a function sim, expressing the similarity between a document and a query. Following D. M. Everett and...
A metric characterization of the Lorenz dominance order (1998)
A metric on the space of real N-vectors RN is defined which has the property to characterise the Lorenz dominance order X < Y for X,Y E FN. The metric d is derived from the Euclidean norm U X'U , on...
Fractal and informetric aspects of hypertext systems (1997)
The paper studies fractal features (such as the fractal dimension] of hypertext systems (such as WWW) and establishes the link with informetric parameters. More concretely, a formula for the fractal...
The paper describes the project RECOSCIX-WIO (Regional Co-operation in Scientfic Information Exchange in the Westem Indian Ocean Region). Details are given on the project's history, operational...
Duality in information retrieval and the hypergeometric distribution (1997)
Duality is an important topic in informetrics, especially in connection with the classical informetric laws. Yet, this concept is less studied in information retrieval. It deals with the unification...
La loi de Bradford était formulée pour la premiere fois dans Bradford (1934) sur des data concernant le sujet "Applied Geophysics" dans la période 1928-1931 et "Lubrication" dans la période...
Quantitative aspects of the management of health information. (1997)
We report on quantitative techniques for managing health information as well as for the marketing and the management of health libraries and other health information centres. Important for marketing...
Price index and its relation to the mean and median reference age. (1997)
This article consists of two parts. In the first part, we assume the simple decreasing exponential model for aging. In this case, we prove that the Price Index (the fraction of the references that...
The Israel Academy of Sciences and Humanities Edited by (1997)
Bluma C. Peritz, Leo Egghe, Mrs R. Askenazi, Bluma C. Peritz, Leo Egghe, Abraham Bookstein, ...
the
We prove a theorem about the invariance of the Lotka function under a transformation which maps numbers of authors as variables to number of collaborators as variables. Moreover, we describe a...
This article extends two previous articles on the application of martingale theory to the well-known generalized ''success-breeds-success'' principle, generalized in order to comprise also other...
Average and global impact of a set of journals (1996)
In this note we clarify some notions concerning citations, publications, and their quotients: impact and indifference (a measure of invisibility, introduced in this article). In particular, we show...
Averaging and globalising quotients of informetric and scientometric data (1996)
Based on the particular case of the average impact factor of a subfield versus the impact factor of this subfield as a whole, the difference is studied between an average of quotients, denoted as AQ,...
Stochastic processes determined by a general success-breeds-success principle (1996)
The general ''success-breeds-success'' (SBS) principle as introduced in a previous paper extends the classical SBS principle in that the allocation of items over sources is determined by a more...
We prove a theorem about the invariance of the Lotka function under a transformation which maps numbers of authors as variables to number of collaborators as variables. Moreover, we describe a...
This article extends two previous articles on the application of martingale theory to the well-known generalized ''success-breeds-success'' principle, generalized in order to comprise also other...
Average and global impact of a set of journals (1996)
In this note we clarify some notions concerning citations, publications, and their quotients: impact and indifference (a measure of invisibility, introduced in this article). In particular, we show...
Averaging and globalising quotients of informetric and scientometric data (1996)
Based on the particular case of the average impact factor of a subfield versus the impact factor of this subfield as a whole, the difference is studied between an average of quotients, denoted as AQ,...
Stochastic processes determined by a general success-breeds-success principle (1996)
The general ''success-breeds-success'' (SBS) principle as introduced in a previous paper extends the classical SBS principle in that the allocation of items over sources is determined by a more...
ON THE INFLUENCE OF PRODUCTION ON UTILIZATION FUNCTIONS - OBSOLESCENCE OR INCREASED USE (1995)
EGGHE, Leo, RAO, Ravichandra, ROUSSEAU, Ronald
We study the influence of production on utilization functions. A concrete example of this is the influence of the growth of literature on the obsolescence (aging) of this literature. Here,...
SENSITIVITY ASPECTS OF INEQUALITY MEASURES (1995)
The purpose of this article is to study inequality measures with respect to their sensitivity to transfers. Sensitivity is studied by means of a particular directional derivative. We observe that...
The success-breeds-success principle (SBS principle) is reformulated in order to generate a general theory of source-item relationships. Several extensions are included such as a time-dependent...
ON THE INFLUENCE OF PRODUCTION ON UTILIZATION FUNCTIONS - OBSOLESCENCE OR INCREASED USE (1995)
EGGHE, Leo, RAO, Ravichandra, ROUSSEAU, Ronald
We study the influence of production on utilization functions. A concrete example of this is the influence of the growth of literature on the obsolescence (aging) of this literature. Here,...
SENSITIVITY ASPECTS OF INEQUALITY MEASURES (1995)
The purpose of this article is to study inequality measures with respect to their sensitivity to transfers. Sensitivity is studied by means of a particular directional derivative. We observe that...
The success-breeds-success principle (SBS principle) is reformulated in order to generate a general theory of source-item relationships. Several extensions are included such as a time-dependent...
This article makes the obvious but rather unexploited remark that there is a structural difference between author-publication systems and, for example, journal-article systems, in the sense that...
BRIDGING THE GAPS - CONCEPTUAL DISCUSSIONS ON INFORMETRICS (1994)
In this paper we discuss the possible gaps between several subdisciplines in informetrics and between informetrics and other -metrics disciplines such as econometrics, sociometrics and so on. It is...
LITTLE SCIENCE, BIG SCIENCE ... AND BEYOND (1994)
LUC,B-3590 DIEPENBEEK,BELGIUM.EGGHE, L, UIA,UNIV PLEIN 1,B-2610 WILRIJK,BELGIUM.
A THEORY OF CONTINUOUS RATES AND APPLICATIONS TO THE THEORY OF GROWTH AND OBSOLESCENCE RATES (1994)
For functions f of a continuous variable t, we define the term ''rate'' (as, e.g., rate of growth or of obsolescence) as the exponential function of the derivative of the logarithm of this function...
This article makes the obvious but rather unexploited remark that there is a structural difference between author-publication systems and, for example, journal-article systems, in the sense that...
BRIDGING THE GAPS - CONCEPTUAL DISCUSSIONS ON INFORMETRICS (1994)
In this paper we discuss the possible gaps between several subdisciplines in informetrics and between informetrics and other -metrics disciplines such as econometrics, sociometrics and so on. It is...
LITTLE SCIENCE, BIG SCIENCE ... AND BEYOND (1994)
LUC,B-3590 DIEPENBEEK,BELGIUM.EGGHE, L, UIA,UNIV PLEIN 1,B-2610 WILRIJK,BELGIUM.
A THEORY OF CONTINUOUS RATES AND APPLICATIONS TO THE THEORY OF GROWTH AND OBSOLESCENCE RATES (1994)
For functions f of a continuous variable t, we define the term ''rate'' (as, e.g., rate of growth or of obsolescence) as the exponential function of the derivative of the logarithm of this function...
In a recent paper, Rousseau [1] notes the fact that if we give weights of 1/m to each author in an m-authored paper, Lotka's law does not apply. However, he also notes that the function modeling the...
The generalized 80/20-rule states that 100 x % of the most productive sources in (for example) a bibliography produce 100 y % of the items and one is interested in the relation between y and x. The...
EVOLUTION OF INFORMATION PRODUCTION PROCESSES AND ITS RELATION TO THE LORENZ DOMINANCE ORDER (1993)
We investigate the evolution and growth of information production processes (in short IPPs). An important role in this investigation is played by the Lorenz curve of concentration and the...
ON THE INFLUENCE OF GROWTH ON OBSOLESCENCE (1993)
In many papers, the influence of growth on obsolescence is studied but a formal model for such an influence has not been constructed. In this paper, we develop such a model and find different results...
In a recent paper, Rousseau [1] notes the fact that if we give weights of 1/m to each author in an m-authored paper, Lotka's law does not apply. However, he also notes that the function modeling the...
The generalized 80/20-rule states that 100 x % of the most productive sources in (for example) a bibliography produce 100 y % of the items and one is interested in the relation between y and x. The...
EVOLUTION OF INFORMATION PRODUCTION PROCESSES AND ITS RELATION TO THE LORENZ DOMINANCE ORDER (1993)
We investigate the evolution and growth of information production processes (in short IPPs). An important role in this investigation is played by the Lorenz curve of concentration and the...
ON THE INFLUENCE OF GROWTH ON OBSOLESCENCE (1993)
In many papers, the influence of growth on obsolescence is studied but a formal model for such an influence has not been constructed. In this paper, we develop such a model and find different results...
CLASSIFICATION OF GROWTH-MODELS BASED ON GROWTH-RATES AND ITS APPLICATIONS (1992)
In this paper, growth models are classified and characterised using two types of growth rates: from time t to t + 1 and from time t to 2t. They are interesting in themselves but can also be used for...
GENERALIZED TRANSFER PRINCIPLES IN ECONOMETRICS AND INFORMETRICS (1992)
The generalized (also called extended) transfer principles as introduced in two earlier papers by Egghe and Rousseau are known to be stronger properties than the classical transfer principle of...
THEORY OF SEARCH KEYS AND APPLICATIONS IN RETRIEVAL TECHNIQUES USED BY CATALOGERS (1992)
This paper constructs a model for studying the performance of search keys of several types (such as, e.g., author/title keys of the form 4/4, 3/3, 3/1/1/1, and so on), and gives a criterion for...
CITATION AGE DATA AND THE OBSOLESCENCE FUNCTION - FITS AND EXPLANTATIONS (1992)
The paper deals with the shape of the obsolescence function, which one can construct, based on the age data of reference lists. This paper shows that the obsolescence factor (aging factor) a is not a...
DUALITY ASPECTS OF THE GINI INDEX FOR GENERAL INFORMATION PRODUCTION PROCESSES (1992)
This paper studies information production processes (IPP) (e.g., bibliographies) from the point of view of concentration theory. More specifically, the Gini index is studied for an IPP as well as for...
CLASSIFICATION OF GROWTH-MODELS BASED ON GROWTH-RATES AND ITS APPLICATIONS (1992)
In this paper, growth models are classified and characterised using two types of growth rates: from time t to t + 1 and from time t to 2t. They are interesting in themselves but can also be used for...
GENERALIZED TRANSFER PRINCIPLES IN ECONOMETRICS AND INFORMETRICS (1992)
The generalized (also called extended) transfer principles as introduced in two earlier papers by Egghe and Rousseau are known to be stronger properties than the classical transfer principle of...
THEORY OF SEARCH KEYS AND APPLICATIONS IN RETRIEVAL TECHNIQUES USED BY CATALOGERS (1992)
This paper constructs a model for studying the performance of search keys of several types (such as, e.g., author/title keys of the form 4/4, 3/3, 3/1/1/1, and so on), and gives a criterion for...
CITATION AGE DATA AND THE OBSOLESCENCE FUNCTION - FITS AND EXPLANTATIONS (1992)
The paper deals with the shape of the obsolescence function, which one can construct, based on the age data of reference lists. This paper shows that the obsolescence factor (aging factor) a is not a...
DUALITY ASPECTS OF THE GINI INDEX FOR GENERAL INFORMATION PRODUCTION PROCESSES (1992)
This paper studies information production processes (IPP) (e.g., bibliographies) from the point of view of concentration theory. More specifically, the Gini index is studied for an IPP as well as for...
TRANSFER PRINCIPLES AND A CLASSIFICATION OF CONCENTRATION MEASURES (1991)
In this article we show that the notion of concentration (or inequality) can best be studied by applying a number of transfer principles. We prove this by showing that transfer principles imply other...
THEORY OF COLLABORATION AND COLLABORATIVE MEASURES (1991)
LIMBURGS UNIV CENTRUM,B-3590 DIEPENBEEK,BELGIUM.EGGHE, L, UNIV INSTELLING ANTWERP,B-2610 WILRIJK,BELGIUM.
THE EXACT PLACE OF ZIPFS AND PARETOS LAW AMONGST THE CLASSICAL INFORMETRIC LAWS (1991)
In this paper, the special place of Zipf's law and Pareto's law amongst other classical informetric laws (such as Bradford's graphical and verbal law, Weber-Fechner's or Brookes', Leimkuhler's and...
TRANSFER PRINCIPLES AND A CLASSIFICATION OF CONCENTRATION MEASURES (1991)
In this article we show that the notion of concentration (or inequality) can best be studied by applying a number of transfer principles. We prove this by showing that transfer principles imply other...
THEORY OF COLLABORATION AND COLLABORATIVE MEASURES (1991)
The paper discusses earlier attempts by Ajiferuke, Burrell, & Tague and by Englisch to define a single measure of collaboration. We show that the variables used in these papers are too rough and...
THE EXACT PLACE OF ZIPFS AND PARETOS LAW AMONGST THE CLASSICAL INFORMETRIC LAWS (1991)
In this paper, the special place of Zipf's law and Pareto's law amongst other classical informetric laws (such as Bradford's graphical and verbal law, Weber-Fechner's or Brookes', Leimkuhler's and...
Rational normalization of concentration measures (1991)
BONCKAERT, Patrick, EGGHE, Leo
We study normalization features of good concentration measures. We extend the classical normalization to the case in which one requires a linear dependence between rational fractions of occupation...
The exact place of Zipf's and Pareto's law amongst the classical informetrics laws (1991)
In this paper, the special place of Zipf's law and Pareto's law amongst other classical informetric laws (such as Bradford's graphical and verbal law, Weber-Fechner's or Brookes', Leimkuhler's and...
I. Statistics This part begins with elementary descriptive statistics and elements of probability. It continues with a chapter on inferential statistics, including regression, correlation and...
Introduction to Informetrics (1990)
The book deals with the following topics: informetrics, bibliometrics, scientometrics, descriptive statistics, probability, inferential statistics, sampling, multivariate statistics, operations...
New Bradfordian laws equivalent with old Lotka, evolving from a source-item argument (1990)
Based on the duality techniques in a previous paper (L. Egghe, The duality o f informetric systems with applications to the empirical laws), we study general relationships between Bradfordian and...
Elements of concentration theory (1990)
We review some concentration measures proposed in the literature and present a set of principles that good concentration measures must fulfill . We moreover look into some of the consequences of...
I. Statistics This part begins with elementary descriptive statistics and elements of probability. It continues with a chapter on inferential statistics, including regression, correlation and...
In this paper we show that the Fussler sampling technique in book shelves is always better than systematic sampling by length. So far this result was only known to be true in the idealized situation...
This paper proves two regularities that where found in the paper [V. Larivière, E. Archambault and Y. Gingras (2007). Long-term patterns in the aging of the scientific literature, 1900-2004....
New relations between similarity measures for vectors based on vector norms
The well-known similarity measures Jaccard, Salton’s cosine, Dice and several related overlap measures for vectors are compared. While general relations are not possible to prove, we study these...
Comparative study of h-index sequences
This paper studies four different h-index sequences (different in publication periods and/or citation periods). Lotkaian models for these h-index sequences are derived by mutual comparison of one...
Collaboration and productivity: an investigation in Scientometrics and in a university repository
EGGHE, Leo, GOOVAERTS, Marc, Kretschmer, H.
In this paper we investigate the following problem: for a fixed field or institute, can we prove that, the higher the number of papers of an author (calculated in the total way), the higher his/her...
Weak and strong convergence of amarts in Fréchet spaces
Several new characterizations of nuclearity in Fréchet spaces are proved. The most important one states tat a Fréchet space is nuclear if and only if every mean bounded amart is strongly a.s....