Function followPaths

  • This function goes to all elements reachable from the given start element.

    Paths

    The function uses paths. A path is an execution path that describes a sequence of regex elements.

    I.e. there are two paths to go from a to b in the pattern /a(\w|dd)b/. The first path is a \w b and the second path is a d d b.

    However, the problem with paths is that there can be exponentially many because of combinatorial explosion (e.g. the pattern /(a|b)(a|b)(a|b)(a|b)(a|b)/ has 32 paths). To solve this problem, paths can be joined together again.

    I.e. in the pattern /a(\w|dd)b/, first element of all paths will be a. After a, the path splits into two. We call each of the split paths a fork. The two forks will be a ( \w and a ( d d. The ( is used to indicate that a fork was made. Since both paths come together after the group ends, they will be joined. The joined path of a ( \w and a ( d d will be written as a ( \w | d d ). The ) is used to indicate that forks have been joined. The final path will be a ( \w | d d ) b.

    This method of forking and joining works for alternations but it won't work for quantifiers. This is why quantifiers will be treated as single elements that can be entered. By default, a quantifier q will be interpreted as ( q | ) if its minimum is zero and as ( q ) otherwise.

    I.e. in the pattern /ab*c/, the paths are a ( b* | ) c, and in /ab+c/, the path is a b+ c.

    State

    Paths are thought of as a sequence of elements and they are represented by state (type parameter S). All operations that fork, join, or assert paths will operate on state and not a sequence of elements.

    State allows operations to be implemented more efficiently and ensures that only necessary data is passed around. An analysis of paths usually tracks properties and analyses how these properties change, the current value of these properties is state.

    Operations

    Operations act upon state and are specific to the type of state. They define how state changes when entering/leaving/asserting elements and how paths fork, join, and continue.

    Operation sequence

    To follow all paths, two methods are necessary: one method that enters elements and one that determines the next element. These methods will be called Enter and Next respectively. These methods will call the given operations roughly like this:

    function Enter(element, state):
        operations.enter
        if operations.continueInto:
            if element.type == GROUP:
                operations.join(
                    element.alternatives.map(e => Enter(e, operations.fork(state)))
                )
            if element.type == QUANTIFIER:
                if element.max == 0:
                    // do nothing
                else if element.min == 0:
                    operations.join([
                        state,
                        Enter(quantifier, operations.fork(state))
                    ])
                else:
                    Enter(quantifier, operations.fork(state))
            if element.type == LOOKAROUND:
                operations.assert(
                    state,
                    operations.join(
                        element.alternatives.map(e => Enter(e, operations.fork(state)))
                    )
                )
        operations.leave
        Next(element, state)
    
    function Next(element, state):
        if operations.continueAfter:
            if noNextElement:
                operations.endPath
            else:
                Enter(nextElement, state)
    

    (This is just simplified pseudo code but the general order of operations will be the same.)

    Runtime

    If n elements can be reached from the given starting element, then the average runtime will be O(n) and the worst-case runtime will be O(n^2).

    Type Parameters

    • S

      The type of the state.

    Parameters

    • start: Alternative | Element
    • startMode: "enter" | "next"

      If "enter", then the first element to be entered will be the starting element. If "leave", then the first element to continue after will be the starting element.

    • initialState: S
    • operations: FollowOperations<S>
    • Optional direction: MatchingDirection

      The direction in which paths will be followed. If undefined, then the natural matching direction (getMatchingDirection) of the start element will be used.

    Returns S