Statistics - a branch of knowledge, a science that outlines general issues of collection, measurement, monitoring and analysis of mass statistical (quantitative or qualitative) data ; the study of the quantitative side of mass social phenomena in numerical form [1] .
The word "statistics" comes from the Latin status - the state of affairs [2] . The term "statistics" was introduced into science by the German scientist Gottfried Aachenwall in 1746, proposing to replace the name of the course "State Studies" taught at German universities with "Statistics", thereby laying the foundation for the development of statistics as a science and academic discipline. Despite this, statistical accounting was conducted much earlier: population censuses were conducted in Ancient China , the military potential of states was compared, citizens' property was recorded in Ancient Rome and the like [3] .
Statistics develops a special methodology for research and processing of materials: mass statistical observations, grouping method, average values , indices, balance method, graphic image method, cluster , discriminant , factor and component analyzes, optimization and other methods of statistical data analysis.
Content
Developing Statistics
The beginning of statistical practice dates back to around the time the state arose. The first published statistical information can be considered clay tablets of the Sumerian kingdom ( III - II millennium BC ).
At first, statistics was understood as a description of the economic and political state of a state or its part. For example, the definition refers to 1792: βstatistics describe the state of the state at present or at some known moment in the past.β And at present, the activities of state statistical services are well within this definition [4] .
However, gradually the term βstatisticsβ began to be used more widely. According to Napoleon Bonaparte , βstatistics are the budget of thingsβ [5] . Thus, statistical methods were found to be useful not only for administrative management, but also for application at the individual enterprise level. According to the wording of 1833, βthe purpose of statistics is to present facts in the most concise formβ [6] . In the 2nd half of the XIX - early XX centuries, a scientific discipline was formed - mathematical statistics , which is part of mathematics.
In the 20th century, statistics are often considered primarily as an independent scientific discipline. Statistics is a set of methods and principles according to which the collection, analysis, comparison, presentation and interpretation of numerical data is carried out. In 1954, Academician of the Academy of Sciences of the Ukrainian SSR B.V. Gnedenko gave the following definition: βStatistics consists of three sections:
- collection of statistical information, that is, information characterizing individual units of any mass aggregates;
- a statistical study of the data obtained, which consists in clarifying those patterns that can be established on the basis of mass observation data;
- development of methods for statistical observation and analysis of statistical data. " The last section, in fact, is the content of mathematical statistics [7] .
The term "statistics" is used in two more senses. Firstly, in everyday life, βstatisticsβ is often understood as a set of quantitative data about a phenomenon or process. Secondly, statistics call a function of the results of observations, used to evaluate the characteristics and parameters of distributions and test hypotheses.
A Brief History of Statistical Methods
Typical examples of the early stages of applying statistical methods are described in the Bible, in the Old Testament . There, in particular, the number of warriors in various tribes is given. From a mathematical point of view, it came down to counting the number of occurrences of the values ββof the observed signs in certain gradations.
Immediately after the emergence of probability theory ( Pascal , Fermat , XVII century), probability models began to be used in the processing of statistical data. For example, the frequency of birth of boys and girls was studied, the difference in the probability of giving birth to a boy from 0.5 was established, the reasons for the fact that in Paris shelters this probability was not the same as in Paris itself, and so on were analyzed.
In 1794 (according to other sources - in 1795) the German mathematician Karl Gauss formalized one of the methods of modern mathematical statistics - the least squares method [8] . In the XIX century, the Belgian Quetelet made a significant contribution to the development of practical statistics, based on an analysis of a large number of real data, which showed the stability of relative statistical indicators, such as the proportion of suicides among all deaths [9] .
The first third of the 20th century was marked by parametric statistics. We studied methods based on the analysis of data from parametric families of distributions described by curves of the Pearson family. The most popular was normal distribution . To test hypotheses, the criteria of Pearson , Student , Fisher were used . The maximum likelihood method , analysis of variance were proposed, the basic ideas of experiment planning were formulated.
The theory of data analysis developed in the first third of the 20th century is called parametric statistics, since its main object of study is samples from distributions described by one or a small number of parameters. The most common is a family of Pearson curves defined by four parameters. As a rule, it is impossible to indicate any good reasons why the distribution of the results of specific observations should be included in one or another parametric family. The exceptions are well known: if the probabilistic model provides for the summation of independent random variables, then the sum is naturally described by a normal distribution; if, in the model, the product of such quantities is considered, then the result, apparently, is approximated by the lognormal distribution, and so on.
Group Types
Under the statistical grouping understand the division of the population into groups (intervals of parameter change) homogeneous in some respects. The number of such intervals (groups) is calculated according to the Sturges formula:
- ,
where k is the number of intervals, n is the number of observations.
There are three types of grouping: analytical, typological, structural.
- Analytical grouping - allows you to identify the relationship between the groups.
- Typological grouping - the division of the studied population into homogeneous groups.
- Structural grouping - in which a homogeneous population is divided into groups, according to a certain attribute.
- Typological grouping - the division of the studied population into homogeneous groups.
Typical groups: maximally uniform inside and heterogeneous outside. Groupings are primary and secondary. Primary groupings are obtained through statistical observations. And the secondary ones are carried out on the basis of the primary.
Statistical Methods
Statistical methods - methods of analysis of statistical data. There are methods of applied statistics that can be applied in all areas of scientific research and any sectors of the economy, and other statistical methods, the applicability of which is limited to one or another field. This refers to methods such as statistical acceptance control, statistical regulation of technological processes, reliability and testing, experimental design.
Classification of statistical methods
Statistical methods of data analysis are used in almost all areas of human activity. They are always used when it is necessary to obtain and justify any judgments about a group (objects or subjects) with some internal heterogeneity.
It is advisable to distinguish three types of scientific and applied activities in the field of statistical methods of data analysis (according to the degree of specificity of the methods associated with immersion in specific problems):
a) the development and research of general-purpose methods, without taking into account the specifics of the scope;
b) the development and study of statistical models of real phenomena and processes in accordance with the needs of a particular field of activity;
c) the use of statistical methods and models for the statistical analysis of specific data in solving applied problems, for example, with the aim of conducting sample surveys .
Applied Statistics
Applied statistics is the science of how to process data of an arbitrary nature. The mathematical basis of applied statistics and statistical analysis methods is probability theory and mathematical statistics .
A description of the type of data and the mechanism of its generation is the beginning of any statistical study. Both deterministic and probabilistic methods are used to describe the data. Using deterministic methods, only the data available to the researcher can be analyzed. For example, with their help, tables have been obtained calculated by the official state statistics bodies on the basis of statistical reports submitted by enterprises and organizations. It is possible to transfer the obtained results to a wider aggregate; they can be used for prediction and control only on the basis of probabilistic-statistical modeling. Therefore, mathematical methods often include only methods based on probability theory.
In the simplest situation, statistical data are the values ββof a certain characteristic inherent in the studied objects. Values ββmay be quantitative or may indicate a category to which the object belongs. In the second case, they speak of a qualitative attribute.
When measured by several quantitative or qualitative indicators, we obtain a vector as statistical data about the object. It can be considered as a new kind of data. In this case, the selection consists of a set of vectors. There is a part of coordinates - numbers, and a part - qualitative (categorized) data, then we are talking about a vector of heterogeneous data.
One element of the sample, that is, one dimension, may be the function as a whole. For example, describing the dynamics of an indicator, that is, its change in time, is the patientβs electrocardiogram or the amplitude of the beating of the motor shaft. Or a time series describing the dynamics of indicators of a particular company. Then the selection consists of a set of functions.
Other mathematical objects can also be elements of a sample. For example, binary relations. So, when interviewing experts, they often use ordering (ranking) of objects of expertise - product samples, investment projects, and management decision options. Depending on the rules of expert research, the elements of the sample can be various types of binary relations (ordering, partitioning , tolerance ), sets , fuzzy sets , etc.
The mathematical nature of the sample elements in various problems of applied statistics can be very different. However, two classes of statistics can be distinguished - numerical and non-numerical. Accordingly, applied statistics is divided into two parts - numerical statistics and non-numerical statistics.
Numerical statistics are numbers, vectors, functions. They can be added, multiplied by coefficients. Therefore, various numbers are of great importance in numerical statistics. The mathematical apparatus for analyzing the sums of random sample elements is the (classical) laws of large numbers and central limit theorems.
Non-numeric statistics are categorized data, vectors of heterogeneous attributes, binary relationships, sets, fuzzy sets, etc. They cannot be added and multiplied by coefficients. Therefore, it makes no sense to talk about the sums of non-numerical statistics. They are elements of non-numeric mathematical spaces (sets). The mathematical apparatus for the analysis of non-numerical statistical data is based on the use of distances between elements (as well as proximity measures, differences indicators) in such spaces. Using distances, empirical and theoretical averages are determined, the laws of large numbers are proved, non-parametric estimates of the probability distribution density are constructed, the problems of diagnostics and cluster analysis are solved, etc. (see [2]).
In applied research, statistical data of various kinds is used. This is due, in particular, to methods for their preparation. For example, if tests of some technical devices continue until a certain point in time, then we get the so-called censored data, consisting of a set of numbers - the duration of a number of devices to failure, and information that other devices continued to work at the end of the test. Censored data is often used in evaluating and monitoring the reliability of technical devices.
Relationship of statistics with other disciplines
Statistics is multidisciplinary, as it uses methods and principles borrowed from other disciplines. Thus, knowledge in the field of sociology and economic theory serves as a theoretical basis for the formation of statistical science. Within the framework of these disciplines, the laws of social phenomena are studied. Statistics helps to assess the scale of a phenomenon, as well as develop a system of methods for analysis and study. Statistics, undoubtedly, is connected with mathematics, since a number of mathematical operations, methods and laws are required to identify patterns, evaluate and analyze the object of study, and the systematization of the results is reflected in the form of graphs and tables.
Statistical Analysis of Specific Data
Development Prospects
The theory of statistical methods is aimed at solving real problems. Therefore, new formulations of mathematical problems in the analysis of statistical data constantly appear in it, new methods develop and justify. Justification is often carried out by mathematical means, that is, by proving theorems. A major role is played by the methodological component - how to set objectives, what assumptions to accept for the purpose of further mathematical study. The role of modern information technologies, in particular, computer experiment, is great.
Actual is the task of analyzing the history of statistical methods in order to identify development trends and use them for forecasting.
Computational statistics
The development of computer technology in the second half of the 20th century had a significant impact on statistics. Previously, statistical models were represented primarily by linear models . The increase in computer speed and the development of appropriate numerical algorithms has led to increased interest in non-linear models such as artificial neural networks , and has led to the development of complex statistical models, for example, a generalized linear model and a hierarchical model .
Computational methods based on re-sampling as a criterion of permutations and bootstrapping , along with methods such as Gibbs sampling, made it possible to use Bayesian algorithms more widely. Currently, there are a variety of statistical software for general and specialized purposes.
Incorrect interpretation of statistical studies
There is an opinion that the data of statistical studies are increasingly intentionally distorting or misinterpreting, choosing only those data that are favorable for the leader of a particular study [10] . Misuse of statistics can be either accidental or intentional. Darrell Huff (1954), How to Lie Using Statistics , outlines a number of considerations regarding the use and misuse of statistics. Some authors also review statistical methods used in specific areas (eg, Varna, Lazo, Ramos, and Ritter (2012)) [11] . Ways to avoid misinterpretation of statistics include the use of proper design and the elimination of bias in research [12] . Abuse occurs when such conclusions are βorderedβ by certain structures that intentionally or unconsciously bring to the selection of biased data or samples [13] . In this case, histograms, as the simplest type of diagram for use and understanding (perception), can be made either using ordinary computer programs or simply drawn [12] . Most people do not try to look for mistakes or are mistaken themselves, and therefore do not see mistakes. Thus, according to the authors, the statistical data, in order to be true, should be βnot combedβ (that is, reliable data should not look ideal) [13] . In order for the obtained statistical data to be plausible and accurate, the sample must be representative in general [14] .
The catch phrase
The most famous (and one of the best [15] ) criticisms of applied statistics , βThere are three types of fraud: lies, blatant lies and statisticsβ, Eng. There are three kinds of lies: lies, damned lies, and statistics ) is traditionally attributed to British Prime Minister Benjamin Disraeli , after attributing Mark Twain in the publication of βThe Head of My Autobiography β ( North American Review Journal July 5, 1907) [16] : β The numbers are deceiving, he wrote, βI have seen this from my own experience; Disraeli rightly spoke on this subject: βThere are three types of lies: lies, blatant lies and statistics.β However, this phrase is not in the works of Disraeli, its origin is debatable. In 1964, C. White ( English Colin White ) [15] suggested the authorship of Francois Maggandy (1783-1855), who said the phrase in French: fr. Ainsi l'altΓ©ration de la vΓ©ritΓ© qui se manifeste dΓ©jΓ sous la forme progressive du mensonge et du parjure, nous offre-t-elle au superlatif, la statistique statistics β). According to White, "the world needed this phrase, and several people could be proud of having invented it."
See also
- Applied Statistics
- Mathematical statistics
- Demography
- Legal statistics
- Query Statistics
- Central Statistical Office
- Lies, blatant lies and statistics
- Nonparametric statistics
Notes
- β Small Soviet Encyclopedia . - M .: Soviet Encyclopedia, 1960. - T. 8. - S. 1090.
- β Raizberg B.A., Lozovsky L. Sh., Starodubtseva Ye. B. Modern economic dictionary. 5th ed., Revised. and add. - M .: INFRA-M, 2007 .-- 495 p. - (Library of dictionaries "INFRA-M")
- β Lecture on statistics - The subject and method of statistics
- β Nikitina E. P., Freidlina V. D., Yarkho A. V. A collection of definitions of the term βstatisticsβ. - Moscow: Moscow State University, 1972.
- β Chuprov A.A. Statistics. - M .: Gosstatizdat of the Central Statistical Administration of the USSR, 1960.
- β Nikitina E. P., Freidlina V. D., Yarkho A. A collection of definitions of the term βstatisticsβ
- β Gnedenko B.V. Essay on the History of Probability Theory. - Moscow: URSS, 2001.
- β Klein F. Lectures on the development of mathematics in the 19th century. Part I. - Moscow, Leningrad: Joint Scientific and Technical Publishing House of the NKTP USSR, 1937.
- β Ploshko B. G., Eliseeva I. I. History of statistics: Textbook. allowance. - Moscow, Leningrad: Finance and Statistics, 1990.
- β Huff, Darrell, How to Lie With Statistics, WW Norton & Company, Inc. New York, NY, 1954. ISBN 0-393-31072-8
- β Warne, R. Lazo, M., Ramos, T. and Ritter, N. (2012). Statistical Methods Used in Gifted Education Journals, 2006β2010. Gifted Child Quarterly, 56 (3) 134-149. doi: 10.1177 / 0016986212444122
- β 1 2 Encyclopedia of Archeology. - Credo Reference: Oxford: Elsevier Science, 2008.
- β 1 2 Cohen, Jerome B. Misuse of Statistics ( Journal ) // Journal of the American Statistical Association : journal. - JSTOR, 1938 .-- December ( vol. 33 , no. 204 ). - P. 657β674 .
- β Freund, JF Modern Elementary Statistics (Neopr.) // Credo Reference. - 1988.
- β 1 2 White, 1964 .
- β Mark Twain. Chapters from My Autobiography . North American Review . Project Gutenberg (September 7, 1906). Date of treatment May 23, 2007. Archived on April 7, 2012.
Literature
- Karaseva L. A. Statistics // World History of Economic Thought : In 6 volumes / Ch. ed. V.N. Cherkovets. - M .: Thought , 1987. - T. I. From the inception of economic thought to the first theoretical systems of political life. - S. 484-494. - 606 s. - 20,000 copies. - ISBN 5-244-00038-1 .
- Miklashevsky I.N. Theoretical statistics // Brockhaus and Efron Encyclopedic Dictionary : 86 volumes (82 volumes and 4 additional). - SPb. , 1890-1907.
- Norman Draper, Harry Smith. Applied Regression Analysis. Multiple Regression = Applied Regression Analysis. - 3rd ed. - M .: "Dialectics" , 2007. - S. 912. - ISBN 0-471-17082-8 .
- Orlov A.I. Applied statistics. Textbook. - M.: Exam, 2006 .-- 671 p.
- Darrell Huff. How to Lie with Statistics = How to Lie with Statistics. - M .: Alpina Publisher , 2015 .-- 163 p. - ISBN 978-5-9614-5212-9 .
- Glinsky V.V., Ionin V.G. Statistical analysis. - M .: Infra-M, 2002 .-- 241 p. - (Higher education). - 5,000 copies. - ISBN 5-16-001293-1 .
- White C. Unkind cuts at statisticians (English) // The American Statistician. - 1964. - Vol. 18 , no. 5 . - P. 15-17 .
Links
- Statistics - an article from the Great Soviet Encyclopedia .
- Statistics // Soviet Historical Encyclopedia
- Federal State Statistics Service of the Russian Federation - Rosstat
- "Statisticians in World War II: They also served." The Economist , Dec 20th 2014