Conditional probability method

In mathematics, in order to prove the existence of mathematical objects with certain combinatorial properties, a probabilistic method is used , in which it is shown that a random object selected from a certain probabilistic sample has the desired property with a positive probability. Consequently, these proofs of existence are non - constructive - they do not explicitly describe a method for constructing or calculating such objects.

The conditional probability method ^[1] ^[2] ^[3] transforms such proof in a “quite accurate sense” into an effective deterministic algorithm that guarantees the detection of an object with the desired properties. That is, the method derandomizes evidence. The basic idea is to replace each random choice in a random experiment with a deterministic choice in such a way as to keep the conditional expectation of failure due to the choice smaller than 1.

The method is partially relevant in the context of probabilistic rounding (which uses a probabilistic method to develop approximation algorithms ).

When the conditional probability method is used, the technical term pessimistic evaluator refers to the values used instead of the conditional probability (or conditional expectation) of the original proof.

Content

Overview

Raghavan ^[3] gives the following description of the method:

First, we show the existence of a provably good approximate solution using the probabilistic method ... [We then] show that the proof of probabilistic existence can be transformed, in a very precise sense, into a deterministic approximation algorithm.

(Raghavan discusses the method in the context of probabilistic rounding , but the method works with a general probability method.)

Conditional expectation method

In order to apply the method to probabilistic evidence, a randomly selected object in the original proof must allow selection by random experiments consisting of a sequence of “small” random selections.

There is a trivial example to illustrate the principle.

Lemma: You can open (hidden) three coins so that the number of tails will be at least 2.

Probabilistic evidence. If three coins are thrown at random, the expected number of the tails is 1.5. Thus, there must be a solution (a way to open coins), so the number of grids will be at least 1.5. Since the number of tails is an integer, there are at least 2 tails in this solution. QED

In this example, a random experiment consists of throwing up three symmetrical coins. The experiment is illustrated with a tree in the figure. There are eight results, each corresponding to a leaf in the tree. The test in a random experiment corresponds to the choice of a random transition from the root (the top node of the tree where the coins are not open) to the leaf. Successful decisions are those in which at least two coins fell heads. The internal nodes of the tree correspond to partially defined solutions, in which 0, 1, 2, and so on coins are opened.

To apply the conditional probabilities method, focus on the conditional probability of failure, given by successive choices made when the experiment goes from step to step. In the diagram, each node is labeled with this conditional probability. (For example, if only the first coin is open, and the result of the experiment is a tail, it corresponds to the second heir of the root. For this intermediate state, the probability of failure is 0.25.)

The conditional probability method replaces a random passage from the root to a leaf in a random experiment with a deterministic passage, in which each step is selected to control the following condition of invariance:

conditional failure expectation, defined by the current state, is less than 1.

This ensures the achievement of a sheet with a label of 0, that is, a successful solution.

The invariance condition is initially (fundamentally) satisfied, since the initial proof shows that the (unconditional) failure probability is less than 1. The conditional probability in any internal node is the arithmetic average of the conditional probabilities of the heirs of the node. This property is important because it implies that any internal node with a conditional probability less than 1 has at least one heir whose conditional probability is less than 1. Thus, in any internal node you can always select some heir so that when you go to the invariance condition was preserved. Since the invariance condition persists until the very end, when the experiment reaches the leaf and all choices are determined, the solution obtained in this way must be successful.

Efficiency

In a typical application of the method, the goal is to realize the final deterministic process using a sufficiently efficient algorithm (the word “efficient” usually means an algorithm whose operation time depends polynomially on the size of the input), even if the number of possible solutions is huge (growing exponentially). For example, consider a problem with the opening of coins for large n .

In the ideal case, if an intermediate state is specified (a node in the tree), the conditional probability of failure (a node label) can be efficiently and accurately calculated. (The example above is similar to this.) If this is the case, the algorithm can select the next node to which to go, by calculating conditional probabilities for each successor of the current node, then the algorithm moves to any successor whose conditional probability is less than 1. As discussed above, there is guarantee the existence of such a node.

Unfortunately, the conditional probability of failure is not easy to calculate effectively. There are two standard related techniques for working with this:

Using conditional expectation: Many probabilistic proofs work as follows: they explicitly define a random variable Q , show that (i) the expectation Q does not exceed (or not less) some threshold value, and that (ii) in any solution, where Q does not exceed (not less) the pore value, the solution is successful. Then from (i) it follows that there is a solution in which Q does not exceed the threshold, and from this and (ii) it follows that there is a successful solution. (In the example above, Q is equal to the number of lattices, and this value should be no less than 1.5 threshold. In many applications, Q is the number of "bad" events (not necessarily independent) occurring in this solution, where each bad event corresponds to one path, according to which the experiment may be unsuccessful, and the expectation of the number of bad events is less than 1.)

In this case, in order to keep the conditional probability of failure below 1, it is enough to keep the conditional expectation of Q below (or above) the threshold. To do this, instead of calculating the conditional probability of failure, the algorithm calculates the conditional expectation Q and behaves according to the value obtained - there is some heir in each internal node, the conditional expectation of which is not more (not less) than the conditional mathematical expectation of the node and the algorithm moves from the current node to the heir in which the conditional expectation is kept is less than (greater than) the threshold.

Using the pessimistic appraiser: In some cases, instead of the exact conditional expectation of the Q value , you can use a fairly close boundary called the pessimistic appraiser . A pessimistic appraiser is a function of the current state. The evaluator must be the upper (or lower) boundary of the conditional expectation Q of a given current state, and it must have a non-increasing (or non-decreasing) expectation at each random experiment step. As a rule, a good pessimistic appraiser can be obtained by carefully disassembling the logic of the original proof.

An example of using conditional expectation

This example shows the conditional probability method using conditional expectation.

Maximum cut lemma

If any undirected graph G = ( V , E ) is specified, the maximal cut problem is to color each vertex of the graph in one of two colors (say, black and white) so as to maximize the number of edges whose final vertices have different colors. (We will speak of such edges as a cut .)

Maximal Cut Lemma (Max-Cut): In any graph G = ( V , E ) at least | E | / 2 edges can be cut.

Probabilistic evidence. Paint each vertex black or white according to the throws of a symmetric coin. For any edge e of E, the probability that the edge is chosen for a cut is 1/2. Then, according to the linearity of the expectation , the expectation of the number of edges of the cut is | E | / 2. Thus, there is a coloring that cuts out at least | E | / 2 edges. QED

The conditional probability method with conditional mathematical expectations

To apply the conditional probability method, first simulate a random experiment as a chain of small random steps. In this case, it is natural to consider each step as the choice of the color of a particular vertex (so that there are | V | steps).

Then, the random selection at each step is replaced by a deterministic choice, which preserves the conditional probability of failure, determined by the coloring of the vertices less than 1. (Here, failure means that the cut consists of less than | E | / 2 edges.)

In this case, the conditional probability of failure is not easy to calculate. In fact, the original proof does not explicitly calculate the probability of failure. Instead, the proof shows that the mathematical expectation of the number of edges of the cut is not less than | E | / 2.

Let the random variable Q be equal to the number of edges of the cut. To keep the conditional failure probability less than 1, it is enough to keep the conditional expectation Q at or above the threshold | E | / 2. (As long as the conditional expectation Q is not less than | E | / 2, there must be an achievable solution in which Q is not less than | E | / 2, so the conditional probability of reaching such a solution is positive.) To keep the conditional expectation Q at the level | E | / 2 or higher, the algorithm at each step will color the vertex so as to maximize the resulting conditional expectation of the value of Q. This is sufficient, since there must exist a node heir, whose conditional expectation is not less than the conditional expectation of the current state (and therefore not less than | E | / 2).

If part of the vertices is already colored, what is this conditional mathematical expectation? According to the logic of the original proof, the conditional expectation of the number of edges of the cut is

the number of edges whose final vertices are painted in different colors

+ (1/2) * ( number of edges with at least one unpainted vertex ).

Algorithm

The algorithm colors each vertex in order to maximize the resulting value of the conditional expectation. This ensures that the conditional expectation is kept at the level | E | / 2 or higher, and this ensures that the conditional expectation of failure is less than 1, which, in turn, guarantees a successful result. The algorithm can be simplified to the following:

  1. For each vertex u of V (in any order):
  2. Consider the already painted neighboring u vertices.
  3. If among these vertices there are more black ones, we color u in white.
  4. Otherwise, color u in black.

According to the construction, this deterministic algorithm guarantees cutting off at least half of the edges of a given graph. (This makes the algorithm a 0.5-approximation algorithm for Max-cut .)

Example of using pessimistic estimates

The following example demonstrates the use of pessimistic ratings.

Turan's theorem

One of the formulations of the theorem of Turan is as follows:

Any graph G = ( V , E ) contains an independent set of size not less than | V | / ( D +1), where D = 2 | E | / | V | - the average degree of the graph.

Probabilistic proof of Turan’s theorem

Consider the following random process for constructing an independent set S :

  1. Set the set S to empty.
  2. For each vertex u of V in random order:
  3. If none of the neighbors of the vertex u are in S , add u to S
  4. Return S.

It is clear that the process gives an independent set. Any vertex u that was considered before all its neighbors will be added to S. Thus, if d ( u ) denotes the degree of u , the probability that u is added to S will be at least 1 / ( d ( u ) +1). According to the linearity of the expectation , the expected size S is not less

\sum _{u\in V}{\frac {1}{d(u)+1}}~\geq ~{\frac {|V|}{D+1}}.

{\ displaystyle \ sum _ {u \ in V} {\ frac {1} {d (u) +1}} ~ \ geq ~ {\ frac {| V |} {D + 1}}.}

(The inequality above follows from the fact that the function 1 / ( x + 1) is convex in x , so that when the left side of the expression is minimized, the values of 2 | E | are given under the sum sign if each d ( u ) = D = 2 | E | / | V |.) QED

The conditional probability method using pessimistic estimates

In this case, the random process has | V | steps. Each step considers some still unexamined vertex u and adds it to S if none of the adjacent vertices has been added yet. Let the random variable Q be equal to the number of vertices added to S. The proof shows that E [ Q ] ≥ | V | / ( D +1).

We will replace each random step with a deterministic step that preserves the conditional expected value of Q above | V | / ( D +1). This will ensure a successful result, that is, a result in which the independent set S has a size not less than | V | / ( D +1), which corresponds to the boundary in the theorem of Turan.

If the first step is completed, let S ^{( t )} denote the vertices added before. Let R ^{( t )} mean unexamined vertices with no neighbors in S ^{( t )} . After the first step, the subsequent reasoning in the original proof that any vertex w from R ^{( t )} that has a conditional probability not less than 1 / ( d ( w ) +1) is added to S means that the conditional expectation of Q is not less

|S^{(t)}|~+~\sum _{w\in R^{(t)}}{\frac {1}{d(w)+1}}.

{\ displaystyle | S ^ {(t)} | ~ + ~ \ sum _ {w \ in R ^ {(t)}} {\ frac {1} {d (w) +1}}.}

Let Q ^{( t )} denote the aforementioned expectation, which is called the pessimistic estimator for conditional expectation.

The proof shows that the pessimistic appraiser is initially not less than the value | V | / ( D +1). (That is, Q ⁽⁰⁾ ≥ | V | / ( D +1).) The algorithm makes every choice, avoiding a decrease in the pessimistic evaluator, that is, so that Q ^{( t +1)} ≥ Q ^{( t )} for each t . Since the pessimistic appraiser is the lower limit of the conditional expectation, this will ensure that the conditional expectation will always be higher | V | / ( D +1), which in turn ensures that the conditional expectation of failure is below 1.

Let u be the vertex considered by the algorithm at step ( t + 1).

If u already has a neighbor in the set S , then u is not added to S and (after checking Q ^{( t )} ), the pessimistic evaluator remains unchanged. If u does not have neighbors in S , then u is added to S.

If u is chosen randomly from the remaining vertices, the expected growth of the pessimistic evaluator is non-negative. [ Calculation. Due to the choice of a vertex from R ^{( t ), the} probability that a given member 1 / ( d ( w ) +1) falls out of the sum of a pessimistic appraiser does not exceed ( d ( w ) +1) / | R ^{( t )} |, so that the expected reduction of each term in the sum does not exceed 1 / | R ^{( t )} |. In sum, there are R ^{( t )} members. Thus, the expected decrease in the amount does not exceed 1. Meanwhile, the size S is increased by 1.]

Thus, there must exist some choice of the vertex u , which keeps the pessimistic estimator non-decreasing.

Pessimistic Rating Maximization Algorithm

The algorithm below selects each vertex u in order to maximize the resulting pessimistic estimator. According to the above reasoning, this prevents the pessimistic appraiser from decreasing and ensures a successful exit.

Below, N ^{( t )} ( u ) means the neighbors of the vertex u in R ^{( t )} (that is, the neighbors of the vertex u that are not in S and have no neighbors in S ).

  1. Set S to blank.
 2. While there is an unexamined vertex u that does not have neighbors in S :
 3. Add vertex u to S , which minimizes  $\sum _{w\in N^{(t)}(u)\cup \{u\}}{\frac {1}{d(w)+1}}$             $Σ$         $w$       $\in$         $N$         $($       $t$       $)$           $($       $u$       $)$       $\cup$       ${$       $u$       $}$               $one$         $d$       $($       $w$       $)$       $+$       $one$                 ${\ displaystyle \ sum _ {w \ in N ^ {(t)} (u) \ cup \ {u \}} {\ frac {1} {d (w) +1}}}$       .
 4. Return S.

Algorithms that do not maximize pessimism

For the conditional probability method to work, it is enough that the algorithm does not reduce the pessimistic estimate (or does not increase, according to the situation). The algorithm does not necessarily maximize (or minimize) the pessimistic estimate. This gives some freedom in the development of the algorithm.

  1. Set S to blank.
 2. As long as there exists a vertex u in a graph without neighbors in S :
 3. Add such a vertex u to S if u minimizes d ( u ) (the initial degree of the vertex u ).
 4. Return S.

  1. Set S to blank.
 2. While the remaining graph is not empty:
 3. Add a vertex u to S if u has the minimal degree in the remaining graph.
 4. Remove u and all neighbors of the vertex in the graph.
 5. Return S.

Each algorithm is analyzed with the same pessimistic evaluator as before. At each step of the algorithm, the total increase in the pessimistic evaluator is equal to

1-\sum _{w\in N^{(t)}(u)\cup \{u\}}{\frac {1}{d(w)+1}},

{\ displaystyle 1- \ sum _ {w \ in N ^ {(t)} (u) \ cup \ {u \}} {\ frac {1} {d (w) +1}},}

where N ^{( t )} ( u ) means the neighbors of the vertex u in the remaining graph (that is, in R ^{( t )} ).

In the first algorithm, the increase is non-negative due to the choice of u ,

\sum _{w\in N^{(t)}(u)\cup \{u\}}{\frac {1}{d(w)+1}}\leq (d(u)+1){\frac {1}{d(u)+1}}=1

{\ displaystyle \ sum _ {w \ in N ^ {(t)} (u) \ cup \ {u \}} {\ frac {1} {d (w) +1}} \ leq (d (u) +1) {\ frac {1} {d (u) +1}} = 1}

,

where d ( u ) is the degree of the vertex u in the original graph.

In the second algorithm, the increase is non-negative due to the choice of u ,

\sum _{w\in N^{(t)}(u)\cup \{u\}}{\frac {1}{d(w)+1}}\leq (d'(u)+1){\frac {1}{d'(u)+1}}=1

{\ displaystyle \ sum _ {w \ in N ^ {(t)} (u) \ cup \ {u \}} {\ frac {1} {d (w) +1}} \ leq (d '(u ) +1) {\ frac {1} {d '(u) +1}} = 1}

,

where d ′ ( u ) is the degree of the vertices u in the remaining graph.

Notes

↑ Erdős, Selfridge, 1973 .
↑ Spencer, 1987 .
↑ ¹ ² Raghavan, 1988 .

Literature

Paul Erdős , JL Selfridge. On a combinatorial game // Journal of Combinatorial Theory, Series A. - 1973. - V. 14 , no. 3 - p . 298–301 . - DOI : 10.1016 / 0097-3165 (73) 90005-8 .
Joel H. Spencer. Ten lectures on the probabilistic method . - SIAM, 1987. - ISBN 978-0-89871-325-1 .
Prabhakar Raghavan. Probabilistic construction of deterministic algorithms: approximating packing integer programs // Journal of Computer and System Sciences . - 1988. - V. 37 , no. 2 - pp . 130–143 . - DOI : 10.1016 / 0022-0000 (88) 90003-7 .

Literature for further reading

Noga Alon , Joel Spencer. The probabilistic method . - Third. - Hoboken, NJ: John Wiley and Sons, 2008. - P. 250 et seq. (in 2nd edition, ISBN 9780471653981 ). - (Wiley-Interscience Series in Discrete Mathematics and Optimization). - ISBN 978-0-470-17020-5 .

Rajeev Motwani , Prabhakar Raghavan. Randomized algorithms . - Cambridge University Press . - p. 120-. - ISBN 978-0-521-47465-8 .
Vijay Vazirani. Approximation algorithms . - Springer Verlag . - p. 130-. - ISBN 978-3-540-65367-7 .

Links

The probabilistic method - method of conditional probabilities , blog entry by Neal E. Young, accessed 19/04/2012.

[_d054e86e30515b06-1] Erdős, Selfridge, 1973 .

[_6151e08fe3da54c0-2] Spencer, 1987 .

[_76b2fea0e7e54195-3] ¹ ² Raghavan, 1988 .