Aceleração exponencial na memória externa

fundo

A memória externa, ou modelo DAM, define o custo de um algoritmo pelo número de E / Ss realizadas (essencialmente, o número de falhas de cache). Esses tempos de execução geralmente são dados em termos de $M$ , tamanho da memória e $B$ , número de palavras que podem ser transferidas para a memória ao mesmo tempo. Às vezes $L$ e $Z$ são usados para $B$ e $M$ respectivamente.

Por exemplo, a classificação requer um custo de $\Theta(N/B\log_{M/B} N/B)$ e a multiplicação de matriz ingênua requer $\Theta(n^3/B\sqrt{M})$ .

Este modelo é usado para analisar "algoritmos de cache-alheio", que não têm conhecimento do $B$ ou $M$ . Geralmente, o objetivo é que o algoritmo alheio ao cache tenha um desempenho ideal no modelo de memória externa; isso nem sempre é possível, como no problema de Permutação, por exemplo (mostrado em Brodal, Faderberg 2003 ). Veja este artigo de Erik Demaine para obter uma explicação adicional dos algoritmos que ignoram o cache, incluindo discussões sobre classificação e multiplicação de matrizes.

Podemos ver que a mudança de causa uma aceleração logarítmica para classificação e uma aceleração polinomial para multiplicação de matrizes. (Esse resultado é originalmente de Hong, Kung 1981 e, na verdade, antecede o esquecimento do cache e a formalização do modelo de memória externa). $M$

Minha pergunta é esta:

Existe algum caso em que a aceleração é exponencial em ? O tempo de execução seria algo como $M$ $f(N,B)/2^{O(M)}$ . Estou particularmente interessado em um algoritmo ou estrutura de dados inconsciente de cache que se encaixa nessa descrição, mas ficaria satisfeito com um algoritmo / estrutura de dados com reconhecimento de cache ou mesmo com um limite inferior mais conhecido.

Geralmente, é assumido na maioria dos modelos que o tamanho da palavra se for o tamanho da entrada e claramente . Então, um aumento de velocidade de dá um aumento de velocidade polinomial em . Isso me faz acreditar que, se o problema que estou procurando existe, não é polinomial. (Caso contrário, podemos alterar o tamanho do cache por uma constante para obter um número constante de E / Ss, o que parece improvável). $w = \Omega(\log N)$ $N$ $M > w$ $2^M$ $N$

ds.algorithms ds.data-structures cache-oblivious SamM
fonte

pode adivinhar, mas

? Encontrou um caso dado como aceleração

, suficiente?

N =

$N=$

B^{p o l y l o g (B)}

$B^{\rm{polylog}(B)}$

vzn

Infelizmente, tem que ser em termos de

para os meus propósitos. Eu estaria interessado na referência embora.

M

$M$

SamM 07/07

wikipedia sobre algoritmos alheios ao cache . fyi existe alguma sutileza nessa notação de campos. A nota de rodapé p7 de Demaine diz nesta área:

é o tamanho do problema e às vezes

onde

é o número de blocos, "mas a notação em minúscula parece ter caído em desuso". você usa

acima e, alternativamente,

aparentemente ambos como tamanho de entrada. pense que você deveria pelo menos padronizar sua pergunta.

N

$N$

n = N / B

$n = N/B$

n

$n$

n

$n$

N

$N$

vzn

Eu editei para consistência.

é o tamanho da entrada, e

é usado apenas para a multiplicação de matrizes, porque o tempo de execução para esse problema é geralmente definido em termos de uma

matriz (ou seja,

)

N

$N$

n

$n$

n \times n

$n\times n$

N = n^{2}

$N = n^2$

SAMM

não vejo casos disso depois de digitalizar a literatura. talvez não haja tal ref? talvez exista algum caso em que qualquer algoritmo desse tipo possa ser complexo e, portanto, difícil de analisar teoricamente para obter tal aceleração ...? ou talvez tivesse que ser inventado ...? ou talvez não seja possível? poderia haver uma idéia de que o acesso aleatório à memória é o pior caso possível? parece que o aumento da velocidade é linear em

para esse caso ...? ou talvez algum padrão fractal de acesso à memória seja o pior caso? esta linha de estudo é apenas um pouco mais de uma década de idade ....

M

$M$

vzn

Respostas:

Não tenho certeza se entendi a pergunta. Mas parece-me que, sob a suposição de que contém problemas que exigem tempo exponencial, esses problemas atenderiam aos seus requisitos, pois, se for você precisará de um número exponencial de operações de E / S ( já que você não pode "permanecer" no mesmo bloco de memória de tamanho mais do que um número polinomial de etapas sem entrar em um ciclo) e se $PSPACE$ $M$ $O(\log N)$ $O( \log N)$ $M=N$ você precisaria apenas de um número linear de operações de E / S. Além disso, com relação à sua observação de que esse problema não pode pertencer a , é correto se a aceleração deve ser mantida para valores de que são (já que isso significa que temos um número exponencial de operações). Mas se a aceleração se aplica apenas a valores menores de , intuitivamente, acredito que isso não seja verdade, porque acho que deve ser possível projetar um problema que é de fato a concatenação de problemas menores do tamanho cada um exigindo exponencial tempo no seu próprio tamanho, e um número exponencial de operações de I / o (isto é, $P$ $M$ $\Omega(N)$ $M$ $O(\log N)$ , uma vez que é exponencial em $poly (N)$ $poly(N)$ $O(\log N)$ $PSPACE$ $TQBF$

user8477
fonte

M = Ω (N)

$M= \Omega(N)$

M = O (\log N)

$M = O(\log N)$

I don't understand why any problem in external memory is trivial if

M = Ω (N)

$M = \Omega(N)$ . If

M = N / 2

$M=N/2$ and the algorithm takes, exponential time, you might be forced to swap back and forth between (so to speak) the two halves of the memory an exponential number of times.

user8477

Ah, of course you're right about the constant factor being important. That makes a lot of sense; this is certainly a good starting point.

SamM

I don't have any argument for intermediate values. At a very superficial level, I guess that some backtracking algorithms would have an exponential dependence on the memory size because I/O operations would be required at nodes of lower depth in the search tree. This dependence would apply to intermediate values. This says nothing about the inherent complexity of the problem, of course. Also, if you have

M = ω (\log N)

$M=\omega(\log N)$ , the pigeonhole (cycling) argument given above would still yield a gain of

T (N) / 2^{M}

$T(N)/2^M$ where

T (N)

$T(N)$ is the time complexity of the problem.

user8477

-4

this question appears to veer into terra incognita ie unexplored territory. after some thinking & review here are some thoughts/ideas on this apparently very difficult/subtle question which hopefully will be intelligent but are not meant to be definitive.

Demain who you cite writes, "the principle idea [of cache oblivious algorithms] is simple: design external-memory algorithms without knowing $B$ and $M$ . But this simple idea has several surprisingly powerful consequences."

it also appears to have surprisingly subtle implications. it appears hard to analyze this model for extremal behavior and it appears nobody has done this so far. there are some surveys and they seem to survey the entire field so far. this is not surprising given the field is only ~1 decade old.

the model does not appear to have been translated to Turing machines yet where more strict/formal/theoretical/general/standard analysis may be possible. the whole concept of Big-Oh notation and its implications and intuitions such as irrelevance of constants is not necessarily automatically carrying in this area of cache oblivious algorithms. for example the model already seems to be working with constants $B,M$ which measure exact memory sizes. it appears the field does not have a set of basic axioms of its dynamics so far.
TMs were invented in 1936 by Turing and the Hartmanis-Stearns time/space hierarchy theorems (which you are somewhat alluding to in this question) were discovered in 1965. thats a remarkable ~3 decades yet the time/space hierarchy theorems are now considered somewhat elementary and taught in undergraduate classes. there does not seem to be corresponding hierarchy theorems established in this model yet, and how to do that is not a trivial problem. this model actually seems to have more "moving parts" than the standard Turing machine (which already has a rather fiendishly complex dynamics), ie is like an augmented Turing machine.
another idea is to convert this external memory model to TMs somehow, which again does not appear to show up in the literature/surveys on cache oblivious algorithms, and maybe see how the Hartmanis-Stearns hierarchy theorems could translate. it appears to relate to a two tape TM where one tape is size 'M' and the other tape is infinite, and blocks are transferred to 'M' in sizes of 'B'. but also, a difficulty here is it is more of a RAM model than the Turing model which uses sequential access to the tape. on the other hand this could run into subtleties associated with conversions between single and multitape TMs.
suggest attacking this problem empirically with simulations which is where the literature tends to focus on eg as in this great survey by Kumar on Cache oblivious algorithms (2003). there are many programs and papers online for cache simulations and one could possibly answer your question without a large amount of code, using some simplifications. one basic idea is to create probabilistic algorithms which access "nearby" areas of memory based on probabilities. "nearby" here is not necessarily contiguous memory, but instead an algorithm that selects random memory pages (blocks) based on keeping track of its own most recently accessed pages. suggest using power laws to determine the probability of selecting "near" pages in the "most recently accessed" list. this appears to be the key aspect that cache-based performance improvements are related to.
heres a rough argument based on 'M' which is basically a measure of "core memory" versus disk memory. for an algorithm, one would expect as core memory increases, one only comes close [asymptotically] to a linear improvement in algorithmic speed, and adding "core memory" to get any super-linear increase in algorithm speed seems intuitively almost like exceeding the speed limit and "getting something for nothing". however, dont grasp the model well enough to prove this [but apparently no one else has either, including the authorities/founders/experts].
this cache-oblivious algorithm literature has focused on P algorithms and did not see any case at all of an analysis of a non-P algorithm so far and perhaps none exists. this could possibly be because the analysis is too difficult! in which case, questions about extremal behavior may go unanswered in this field in the long term.
it does not even seem intuitively clear, or possibly at all studied yet, about how "small" complexity algorithms such as in L, or "big" algorithms such as non-P (eg Expspace etc) behave in this model. the model is measuring memory locality, which seems to be fundamentally different, but intertwined with, time and space hierarchies.
there is a somewhat obscure measurement of Turing machine complexity called "reversal complexity" with some study, results, and papers that might be related. basically the TM can cover a certain amount of time and space, but reversals are a somewhat independent measurement of the tape head during the computation. it seems that "high reversals" might be related to "high memory locality" because in both cases the tape head is tending to stay in a "smaller" region versus moving into larger regions.
this question and model reminds me of Amdahls law and suspect some kind of yet-undiscovered similar law related to diminishing returns or balances/tradeoffs might hold or be applicable in this area. rough reasoning: cache oblivious algorithm theory is looking at a balance/tradeoff between a finite "local" memory and a cost-based external "infinite" disk. this is basically two resources that behave "in parallel" and there is likely some kind of optimal tradeoff between them.

vzn
fonte

In your first point, what would "more theoretical analysis" even mean? A TM doesn't have a

k

$k$ -level memory hierarchy. The ideal-cache model is easy and natural to work with, and has been experimentally proven to be a useful model with respect to memory behavior on real systems.

Juho

the TM model is the basic model of TCS and "bridge thms" between its complexity hierarchy (time/space, basic complexity classes such as P/NP etc) with cache oblivious algorithms apparently remain to be mapped out. the external memory models & related cache oblivious models are basically attempting to model real-world performance characteristics & the literature is not so far interested in greater theoretical abstractions such as asked by the question.

vzn