What is meant by directed acyclic graph?
A directed acyclic graph (DAG) is essentially a conceptual representation of a series of activities. This involves a graphic depicting the order of activities, where it is visually presented as a set of circles. Each of the circles represents an activity and some of the activities are connected by lines. The lines represent the flow from one activity to another one.
Every circle is known as a vertex and every line is known as an edge.
The word ‘Directed’ in ‘directed acyclic graph’ signifies that every edge has a defined direction. Essentially, this means that every edge must represent a single directional flow from one vertex to another. The word ‘acyclic’ in the term denotes that there are no loops (known as cycles) in the graph, so for any specific vertex, if you follow an edge that connects that vertex to another vertex, you will not be able to find any path in the graph to get back to that initial vertex.
What is Directed Acyclic Graph used for?
A DAG can be used to representing several different types of flows, including data processing flows. If you think of large-scale processing flows in terms of DAGs, it becomes possible for you to organize the various steps and the associated order for these jobs in a clearer fashion.
Directed acyclic graphs have a property called topological ordering. This means that it is possible to put the nodes of a DAG into a linear sequence with the nodes given an “ordering”. The nodes at the beginning of the sequence possess a lower value than the ones at the end of the sequence.
This property makes DAGs very efficient at multiple tasks (like identifying the shortest path from one node to another). It’s because of this property that DAGs have a very wide range of use-cases. Here are a few of those use-cases:
- Directed acyclic graphs are used in project management for the purpose of planning, designing, and implementing complex projects or tasks. As an example, in Apache Spark, directed acyclic graphs are used to represent a chain of Resilient Distributed Dataset (RDD) dependencies.
- Directed acyclic graphs are also used to develop faster and more affordable distributed ledgers. Most distributed ledgers like Blockchain have not been adopted widely because they aren’t very scalable, their speed is rather low, and transaction costs are very high. For example, the Bitcoin blockchain can only process 4 to 7 transactions per second, which makes it impossible to be adopted on a very wide scale. But a distributed ledger that is based on directed acyclic graphs is able to process hundreds of thousands of transactions every second with transaction costs that are much lower. This could make use cases like P2P energy trading quite possible, which blockchain has not been able to do yet.
- DAGs can even be used in medical and clinical studies to identify confounding and sources of bias. They make it easier to identify confounders and spurious relationships. Without a directed acyclic graph, erroneous conclusions would easily be reached.
- DAG representations of partial orderings also have multiple applications in scheduling for systems of tasks with ordering constraints. One important class of this kind of problem focuses on collections of objects that need to be updated. Here, a dependency graph refers to a graph that possesses a vertex for every object that has to be updated, and an edge connecting two objects whenever either of them has to be updated earlier than the other. Cycles in this graph are known as circular dependencies, and are generally not permitted, as it would not be possible to consistently schedule the tasks involved in the cycle. Dependency graphs that do not possess circular dependencies are essentially directed acyclic graphs.
- In data processing networks, a DAG could be used for representing a network of processing elements. Here, data could enter a processing element through its incoming edges and could leave the element through its outgoing edges. Dataflow programming languages describe systems of operations on data streams, and also the connections between the outputs of some operations and the inputs of other operations. These languages can be conveniently used to describe repetitive data processing tasks, in which the same acyclically connected collection of operations get applied to several data items. They can be executed as parallel algorithms where every operation gets performed by a parallel process as soon as another set of inputs is available to it.
- In compilers, a DAG that describes the inputs and outputs of every one of the arithmetic operations performed within the code can be used to represent straight-line code. Doing that makes it possible for the compiler to perform common subexpression elimination in a more efficient manner.
- Directed acyclic graphs even have applications in genealogy and version history. A family tree could essentially be looked at as a directed acyclic graph with a vertex for every family member and an edge for every parent-child relationship. These graphs may not exactly be trees due to the possibility of marriages between relatives which would cause a child to have a common ancestor on both the maternal and the paternal side, resulting in pedigree collapse. Family trees are acyclic since it isn’t possible for anyone to become their own ancestor.
What are the properties of a directed acyclic graph?
A DAG will have at least 4 properties, no less. These are:
A node or a vertex is a place where data is stored. These nodes are the circles that we mentioned earlier.
These are the lines or arrows that point in one direction from one vertex or node to another.
A great ancestral node
This is essentially a node that has no parents.
These are nodes or vertices that do not have any children. These leaf nodes represent identifiers, names, or constants.