Красно-зелёные деревья, снова.

Сообщений 1 страница 3 из 3

Поделиться12017-07-10 10:58:45

Автор: ВежливыйЛис
Администратор
Зарегистрирован: 2017-03-25
Приглашений: 0
Сообщений: 1116
Уважение: [+2/-1]
Позитив: [+4/-1]
Провел на форуме:
7 дней 13 часов
Последний визит:
2025-04-16 01:39:42

Я уже где-то про них писал, но не могу найти ту тему. Всё равно мне на неё никто не ответил.

В общем смысл такой:
- существует (в мире Java) самый передовой в мире парсер Antlr с алгоритмом ALL(*)

его альтернативной (в мире C#) является ещё более продвинутая платформа парсинга Roslyn (под лицензией Apache 2.0)
Microsoft Visual Studio 2015 was completely redesigned to use the .NET Compiler Platform. Refactorings, diagnostics, code fixes, find references, go to definition and so on are re-implemented using Roslyn.

Вот в этом рослине и применяются красно-зелёные деревья.

Из смысл в том, что при редактировании текста он сьезжает (позции меняются), поэтому синтаксические узлы разделены на две части - зелёную, которую можно сохранить, и красную, которая при сьезжании пересоздаётся (хм, а просто индекс нельзя было поправить?).

см. также
https://wiki.haskell.org/Zipper
BCL immutable collections

2012-08-26, русский перевод, Персистентность, фасады и красно-зеленые деревья в Roslyn
2012-06-08, Eric Lippert, Persistence, Facades and Roslyn’s Red-Green Trees

class SyntaxNode - http://source.roslyn.codeplex.com/ #Microsoft.CodeAnalysis/Syntax/SyntaxNode.cs,849dc6029695ef7b

part of the reason for making these nodes immutable was so they could be used from multiple threads without need for any locks (or even for lock-free techniques, for that matter). But if the red nodes are created on demand, I take it that must mean you *do* have some locking around that lazy-creation?
I've heard rumors about Roslyn being parallel, but have never seen definitive information about that. What parts of Roslyn can run in parallel? Parsing? Typechecking? IL emission? Are there any benchmarks that compare Roslyn-based compilers with old ("native") ones?

https://roslyn.codeplex.com/discussions/541953

As you may notice from descriptions and from the actual implementation, green nodes are immutable and self-contained. In particular, green node does not know its position in text and does not know its parent. As a result the same green node for "2 - 2 " can represent any occurrence of "2 - 2 ", - as long as it is parses into identically shaped green sub-tree, why not just use the same sub-tree?
That is achieved in green node factories - before creating a green node for "2 - 2 " the factory will check if it has already seen a binary expression with same children "2 ", "- ", "2 " and if answer is yes, the it will just return a cached node. One can see it as a kind of memorization.

a very large fraction of nodes in the green tree are tokens - terminal nodes like punctuation, keywords, identifiers, literals. They are also ones with the most repetitions - once you use variable "x" in code you are likely to use it more than once.

Red nodes are the public façade to the syntax model. Unlike a green node, red node:
knows its location span in the tree
knows not only children, but the parent too
knows the syntax tree to which the node belongs.

red nodes cannot be reused within a tree (all nodes have different spans) or between trees (different parents). The problem arises – if there is a red tree for every version of the text, how can we afford a new tree every time when user types a character? The solution is to build the red tree lazily.

Laziness helps tremendously in scenarios like typing. - When user is furiously typing, no one has the time or need to examine the entire red tree. Once user types another character the current tree becomes irrelevant and any analysis, if started, tend to be cancelled and restarted with the new tree (often with a small delay – in case user will continue typing, no need to start anything too eagerly). As a result intermediate red trees, while fully "available", will generally see only a tiny fraction of nodes realized.

Red nodes are remembered by their parent red nodes in most circumstances and the root is held by the syntax tree, so they are shared by all users of the same tree instance, but are not otherwise cached.

There is a "new tree" for every edit, but remember that most of the green nodes are shared across successive edits. If the tree is of depth 'h', then on average only new O(h) green nodes are created. There could be a distinct red node in each version of the tree, but red nodes are created lazily

We do cache trees, based on the contents of the file, but it isn't an unlimited cache. On undo (or any edit) we check if a tree corresponding to the editor's content exists, and use it if it does, or else create a new tree with the incremental lexer and parser. Any time we create a new tree, we add it to the cache, which may cause other, less used trees to be evicted from the cache.

https://github.com/dotnet/roslyn/issues/6296

Отредактировано Лис (2017-07-10 12:09:41)

Поделиться22017-07-10 11:57:33

Автор: ВежливыйЛис
Администратор
Зарегистрирован: 2017-03-25
Приглашений: 0
Сообщений: 1116
Уважение: [+2/-1]
Позитив: [+4/-1]
Провел на форуме:
7 дней 13 часов
Последний визит:
2025-04-16 01:39:42

== Weak red subtrees.

An interesting thing to note about red nodes is that they have reference identity. Same piece of syntax in a given tree is observably represented by the same unique red node object. Example: If one gets from a leftmost child of some node to the node via parent reference and then descends from the parent into its leftmost child he will get exactly the same node from which he started. This is a very important quality of the red nodes. Equality of nodes is a very cheap reference comparison.
The implementation is obvious - parents store references to children that are already created and just return already created ones if these children are need again. A side-effect of such implementation is that it makes the red tree a fully connected graph (can get from any node to any another). From the prospective of GC as long as any node in the tree is reachable, the whole red tree would be retained. Once nodes are created, they would live as long as the whole tree. That is a bit inconvenient. Imagine user scrolling through a code file – colorizer will materialize the nodes for the visible syntax, but after the view moves the nodes become invisible and no longer interesting for most kinds of analysis.

A weakly referenced red subtrees can be reclaimed by GC if noone uses them and if needed “decompressed” again. Note that this does not violate the principle of reference identity. Newly constructed red nodes are indeed new objects, but since the old nodes representing same syntax are no longer reachable, no one would be able to notice the difference.

Obviously, we do not want to make all the children references in the tree weak. Weak references take space themselves, not as fast as regular references and require extra GC housekeeping. It has been observed that many kinds of analysis, including binding, are much more likely to require syntax within same method where analysis is performed while reaching for other method bodies is rare. Based on this, Roslyn red trees refer to method body syntax weakly. - If a particular method body is no longer used, GC can put its red nodes to a better use. Typically at any given time only some methods in a large file would have materialized red trees thus significantly reducing average heap size.

https://robinsedlaczek.com/2015/04/29/i … eakroslyn/

Отредактировано Лис (2017-07-10 11:58:11)

Поделиться32017-07-10 12:14:48

Автор: ВежливыйЛис
Администратор
Зарегистрирован: 2017-03-25
Приглашений: 0
Сообщений: 1116
Уважение: [+2/-1]
Позитив: [+4/-1]
Провел на форуме:
7 дней 13 часов
Последний визит:
2025-04-16 01:39:42

2013, JEFFREY L. OVERBEY, Immutable Source-Mapped Abstract Syntax Tree, A Design Pattern for Refactoring Engine APIs
Immutable, source-mapped ASTs are used by the:
- Eclipse Java Development Tools (JDT)
- Eclipse C/C++ Development Tools (CDT)
- Scala IDE for Eclipse
- Microsoft “Roslyn” Community Technology Preview (CTP)
- Clang
- original refactoring engine in Apple X code

However, while all use immutable ASTs with source mapping information, the actual rewriting APIs vary substantially.

2015, Павел Авсенин, Roslyn: использование в крупном проекте на примере CodeRush

Использование API снаружи:
2014-07-06, , Learn Roslyn Now: Part 2 Analyzing Syntax Trees with LINQ
2014-07-11, , Learn Roslyn Now: Part 3 Syntax Nodes and Syntax Tokens

документация - https://roslyn.codeplex.com/wikipage?ti … umentation

Отредактировано Лис (2017-07-10 12:38:13)

Применение искинов - шоссе империализма (Стенгазета русификаторов ИТ)

Меню навигации

Пользовательские ссылки

Информация о пользователе

Красно-зелёные деревья, снова.

Сообщений 1 страница 3 из 3

Поделиться12017-07-10 10:58:45

Поделиться22017-07-10 11:57:33

Поделиться32017-07-10 12:14:48