fundo
A memória externa, ou modelo DAM, define o custo de um algoritmo pelo número de E / Ss realizadas (essencialmente, o número de falhas de cache). Esses tempos de execução geralmente são dados em termos de , tamanho da memória e , número de palavras que podem ser transferidas para a memória ao mesmo tempo. Às vezes e são usados para e respectivamente.
Por exemplo, a classificação requer um custo de e a multiplicação de matriz ingênua requer .
Este modelo é usado para analisar "algoritmos de cache-alheio", que não têm conhecimento do ou . Geralmente, o objetivo é que o algoritmo alheio ao cache tenha um desempenho ideal no modelo de memória externa; isso nem sempre é possível, como no problema de Permutação, por exemplo (mostrado em Brodal, Faderberg 2003 ). Veja este artigo de Erik Demaine para obter uma explicação adicional dos algoritmos que ignoram o cache, incluindo discussões sobre classificação e multiplicação de matrizes.
Podemos ver que a mudança de causa uma aceleração logarítmica para classificação e uma aceleração polinomial para multiplicação de matrizes. (Esse resultado é originalmente de Hong, Kung 1981 e, na verdade, antecede o esquecimento do cache e a formalização do modelo de memória externa).
Minha pergunta é esta:
Existe algum caso em que a aceleração é exponencial em ? O tempo de execução seria algo como f ( N , B ) / 2 O ( M ) . Estou particularmente interessado em um algoritmo ou estrutura de dados inconsciente de cache que se encaixa nessa descrição, mas ficaria satisfeito com um algoritmo / estrutura de dados com reconhecimento de cache ou mesmo com um limite inferior mais conhecido.
Geralmente, é assumido na maioria dos modelos que o tamanho da palavra se N for o tamanho da entrada e claramente M > w . Então, um aumento de velocidade de 2 M dá um aumento de velocidade polinomial em N . Isso me faz acreditar que, se o problema que estou procurando existe, não é polinomial. (Caso contrário, podemos alterar o tamanho do cache por uma constante para obter um número constante de E / Ss, o que parece improvável).
Respostas:
Não tenho certeza se entendi a pergunta. Mas parece-me que, sob a suposição de que contém problemas que exigem tempo exponencial, esses problemas atenderiam aos seus requisitos, pois, se M for O ( log N ), você precisará de um número exponencial de operações de E / S ( já que você não pode "permanecer" no mesmo bloco de memória de tamanho O ( log N ) mais do que um número polinomial de etapas sem entrar em um ciclo) e se M = NPSPACE M O(logN) O(logN) M=N você precisaria apenas de um número linear de operações de E / S. Além disso, com relação à sua observação de que esse problema não pode pertencer a , é correto se a aceleração deve ser mantida para valores de M que são Ω ( N ) (já que isso significa que temos um número exponencial de operações). Mas se a aceleração se aplica apenas a valores menores de M , intuitivamente, acredito que isso não seja verdade, porque acho que deve ser possível projetar um problema que é de fato a concatenação de problemas menores do tamanho O ( log N ), cada um exigindo exponencial tempo no seu próprio tamanho, e um número exponencial de operações de I / o (isto é, p óP M Ω(N) M O(logN) , uma vez que p o l y ( N ) é exponencial em O ( log N )poly(N) poly(N) O(logN) PSPACE TQBF
fonte
this question appears to veer into terra incognita ie unexplored territory. after some thinking & review here are some thoughts/ideas on this apparently very difficult/subtle question which hopefully will be intelligent but are not meant to be definitive.
Demain who you cite writes, "the principle idea [of cache oblivious algorithms] is simple: design external-memory algorithms without knowingB and M . But this simple idea has several surprisingly powerful consequences."
it also appears to have surprisingly subtle implications. it appears hard to analyze this model for extremal behavior and it appears nobody has done this so far. there are some surveys and they seem to survey the entire field so far. this is not surprising given the field is only ~1 decade old.
the model does not appear to have been translated to Turing machines yet where more strict/formal/theoretical/general/standard analysis may be possible. the whole concept of Big-Oh notation and its implications and intuitions such as irrelevance of constants is not necessarily automatically carrying in this area of cache oblivious algorithms. for example the model already seems to be working with constantsB,M which measure exact memory sizes. it appears the field does not have a set of basic axioms of its dynamics so far.
TMs were invented in 1936 by Turing and the Hartmanis-Stearns time/space hierarchy theorems (which you are somewhat alluding to in this question) were discovered in 1965. thats a remarkable ~3 decades yet the time/space hierarchy theorems are now considered somewhat elementary and taught in undergraduate classes. there does not seem to be corresponding hierarchy theorems established in this model yet, and how to do that is not a trivial problem. this model actually seems to have more "moving parts" than the standard Turing machine (which already has a rather fiendishly complex dynamics), ie is like an augmented Turing machine.
another idea is to convert this external memory model to TMs somehow, which again does not appear to show up in the literature/surveys on cache oblivious algorithms, and maybe see how the Hartmanis-Stearns hierarchy theorems could translate. it appears to relate to a two tape TM where one tape is size 'M' and the other tape is infinite, and blocks are transferred to 'M' in sizes of 'B'. but also, a difficulty here is it is more of a RAM model than the Turing model which uses sequential access to the tape. on the other hand this could run into subtleties associated with conversions between single and multitape TMs.
suggest attacking this problem empirically with simulations which is where the literature tends to focus on eg as in this great survey by Kumar on Cache oblivious algorithms (2003). there are many programs and papers online for cache simulations and one could possibly answer your question without a large amount of code, using some simplifications. one basic idea is to create probabilistic algorithms which access "nearby" areas of memory based on probabilities. "nearby" here is not necessarily contiguous memory, but instead an algorithm that selects random memory pages (blocks) based on keeping track of its own most recently accessed pages. suggest using power laws to determine the probability of selecting "near" pages in the "most recently accessed" list. this appears to be the key aspect that cache-based performance improvements are related to.
heres a rough argument based on 'M' which is basically a measure of "core memory" versus disk memory. for an algorithm, one would expect as core memory increases, one only comes close [asymptotically] to a linear improvement in algorithmic speed, and adding "core memory" to get any super-linear increase in algorithm speed seems intuitively almost like exceeding the speed limit and "getting something for nothing". however, dont grasp the model well enough to prove this [but apparently no one else has either, including the authorities/founders/experts].
this cache-oblivious algorithm literature has focused on P algorithms and did not see any case at all of an analysis of a non-P algorithm so far and perhaps none exists. this could possibly be because the analysis is too difficult! in which case, questions about extremal behavior may go unanswered in this field in the long term.
it does not even seem intuitively clear, or possibly at all studied yet, about how "small" complexity algorithms such as in L, or "big" algorithms such as non-P (eg Expspace etc) behave in this model. the model is measuring memory locality, which seems to be fundamentally different, but intertwined with, time and space hierarchies.
there is a somewhat obscure measurement of Turing machine complexity called "reversal complexity" with some study, results, and papers that might be related. basically the TM can cover a certain amount of time and space, but reversals are a somewhat independent measurement of the tape head during the computation. it seems that "high reversals" might be related to "high memory locality" because in both cases the tape head is tending to stay in a "smaller" region versus moving into larger regions.
this question and model reminds me of Amdahls law and suspect some kind of yet-undiscovered similar law related to diminishing returns or balances/tradeoffs might hold or be applicable in this area. rough reasoning: cache oblivious algorithm theory is looking at a balance/tradeoff between a finite "local" memory and a cost-based external "infinite" disk. this is basically two resources that behave "in parallel" and there is likely some kind of optimal tradeoff between them.
fonte