Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type inference prototype based on static analysis #92

Open
wants to merge 105 commits into
base: develop
Choose a base branch
from

Conversation

0x7CFE
Copy link
Owner

@0x7CFE 0x7CFE commented Jun 18, 2016

This PR is my first rather independent (or naïve?) attempt to realize how type inference may be used to aid JIT VM. Underlying concept is not based on any well-known theory like Hindley-Milner type system or Martin Löf's Intuitionistic type theory. It's the result of a pure meditation on Smalltalk bytecodes.

Surprisingly enough, Smalltalk being fully dynamic in it's nature is still very regular in terms of it's memory access and control flow. This really helps when we try to perform static analysis.

Current implementation concentrates on the method temporaries and gives up completely when it faces object fields. However, I believe that in local context it is still possible to infer the object's fields to fully unlock further analysis and optimizations, like stack allocation, GC root elimination, and of course TBAA.

Current implementation is powerful enough to infer self assigning temporaries within a loop and even in complex closure contexts.

For example, the following method is inferred completely:

testInference |sum|
    sum <- 0.
    1 to: 100 do:
        [ :x | sum <- sum + x ].
    ^sum

Here analyzer proves that sum variable will have SmallInt type in it's scope which spans across the follwing methods: Undefined>>testInference, Number>>to:do:, Block>>value: and finally, the block Undefined>>testInference@8. Integer overflow is currently undefined.

This allows IR generator to encode operations on sum directly as i32 without any worries about GC or dynamic dispatch.

Current inference scheme highly uses method monomorphisation and specialization, which helps to perform calculations at compile time.

For example, this code is reduced to a single literal value (the result) at compile time:

fibonacci: n
    n < 3 ifTrue: [ ^1 ].
    ^ (self fibonacci: n - 2) + (self fibonacci: n - 1)

Type inference works even in case of recurring contexts and correctly solves chicken or egg problem. Consider the listing of the Collection>>sort:

sort: criteria | left right mediane |
    (self size < 2) ifTrue: [^self].

    mediane <- self popFirst.

    left  <- List new.
    right <- List new.
    self do: [ :x |
        (criteria value: x value: mediane)
            ifTrue:  [ left  add: x ]
            ifFalse: [ right add: x ] ].

    left  <- left  sort: criteria.
    right <- right sort: criteria.

    right add: mediane.
    ^ left appendList: right

This is suboptimal implementation of the Quicksort algorithm used here only for testing purposes. It performs two recursive calls to Collection>>sort: when sorting left and right sublists. The main difficulty here is to infer the return value of the Collection>>sort:. For example, analysis of the left refers to the outer context which at that point is not inferred completely, hence no return type available.

However, current implementation correctly solves the problem using a fact, that every recursion has it's base which should be evaluated prior to the next recursive call. By propagating the base return type from the outer context to the inner one we succeed in the whole process. See the log for an example of such inference. You may also see the resulting call graph as rendered svg or graphviz source.

Static analysis along with runtime statistics and polymorphic method cacheing provides enough information for effective dispatch of the Smalltalk code.

See also: #56, #58.

The implementation is not complete yet, there are a lot of things to do:

  • Implement inference of non-literal recursive messages, such as Collection>>sort:
  • Implement inference of composite types one per entry
  • Detect block escaping and avoid inference of the captured variables
  • Detect situations where inference will produce invalid results
  • Detect correct number of passes for guaranteed inference
  • Detect when method specialization is redundant and unnecessary
  • Solve the situation when phi incoming maps to a node from a dead path
  • Type inference walker should use breadth-first traverse
  • Analysis for block invocation kind in closure sites
  • Inference across phi nodes
  • Memory management, block inference caches, …
  • Implement inference of the remaining primitives
  • Use inferred types in actual code generation
  • Take care of non-static objects
  • Tests for all of this

0x7CFE added 30 commits June 18, 2016 18:29
Also branch optimization technique is proposed,
however it is currently lead to a malformed graph

Issue: #32
This allows us to classify graph edges during graph walk.
Previous version of optimizer merged taus incorrectly.
Method which produced error was Interval>>do:

Producer1 \
           { Aggregator <- Consumer } x 3
          /
Producer2
          \
           { Aggregator <- Consumer } x 3
Producer3 /

Correct solution should be

Producer1 \           / Consumer1
           Aggregator - Consumer2
          /           \ Consumer3
Producer2
          \           / Consumer4
           Aggregator - Consumer5
Producer3 /           \ Consumer6

but due to incorrect handling of pending nodes
lists algorithm yielded the following result:

Producer1 \
Producer2 - Aggregator <- Consumer x 6
Producer3 /

So there was a single aggregator node that was
referred by all six consumers.
AssignX instructions leave their argument on the stack.
That was causing problems during processing of argument requests.
It allows to compare any two types, store them in STL
container such as std::set, use them as a key in std::map
and use composition operators during type inference procedure.

Operator | is a disjunction-like operator used to get sum of
several possible types within a type.

For example:
    2 | 2 -> 2
    2 | 3 -> (2, 3)
    2 | * -> (2, *)
    (Object) | (SmallInt) -> ((Object), (SmallInt))

This operator may be used to aggregate possible types within
a linear sequence where several type outcomes are possible:
    x <- y isNil ifTrue: [ nil ] ifFalse: [ 42 ].

In this case x will have composite type (nil, 42).

On the other hand, when dealing with loops we need some
kind of a reduction operator that will act as a conjunction:

    2 & 2 -> 2
    2 & 3 -> (SmallInt)
    2 & (SmallInt) -> (SmallInt)

    <any type> & * -> *
    (SmallInt) & (Object) -> *

This operator is used during induction run of the type analyzer
to prove that variable does not leave it's local type domain,
i.e it's type is not reduced to a *.

Issue: #17
Meta info is very useful during type analysis.
It helps to make decisions based on graph structure.

In future, more flags will be added.

Issue: #17
This code need to be refactored properly.
In case if both operands are literal, then
result may be defined as literal too.

Otherwise primitive should "fail" by allowing
control flow to pass further.

For literal calculation it is best to use existing
code for software VM.

Issue: #17
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from 3a54498 to 9cfb35d Compare August 13, 2016 04:22
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from 9cfb35d to 6c130b3 Compare August 13, 2016 04:49
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from 6c130b3 to c8e71b9 Compare August 13, 2016 08:49
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from c8e71b9 to 1c03337 Compare August 13, 2016 09:31
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from 1c03337 to 8dd1ffa Compare August 13, 2016 10:11
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from 8dd1ffa to 9c654c5 Compare August 13, 2016 10:36
kpp added a commit that referenced this pull request Aug 13, 2016
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from 9c654c5 to a187a4b Compare August 13, 2016 10:48
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from a187a4b to cc8290a Compare August 13, 2016 10:56
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from cc8290a to 5dd2d12 Compare August 13, 2016 11:39
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from 5dd2d12 to 06d0901 Compare August 13, 2016 14:21
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from 06d0901 to 5d23406 Compare August 13, 2016 16:09
kpp added a commit that referenced this pull request Aug 13, 2016
@kpp kpp force-pushed the feature/17/type_inference branch from 5d23406 to 34c1048 Compare August 13, 2016 17:07
@coveralls
Copy link

coveralls commented Aug 13, 2016

Coverage Status

Coverage decreased (-12.7%) to 48.582% when pulling 34c1048 on feature/17/type_inference into ff4d76d on develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants