ggraph: ggplot for 그래프를 위한 ggplot 


A graph, a collection of nodes connected by edges, is just data. Whether it’s a social network (where nodes are people, and edges are friend relationships), or a decision tree (where nodes are branch criteria or values, and edges decisions), the nature of the graph is easily represented in a data object. It might be represented as a matrix (where rows and columns are nodes, and elements mark whether an edge between them is present) or as a data frame (where each row is an edge, with columns representing the pair of connected nodes).

The trick comes in how you represent a graph visually; there are many different options each with strengths and weaknesses when it comes to interpretation. A graph with many nodes and edges may become an unintelligible hairball without careful arrangement, and including directionality or other attributes of edges or nodes can reveal insights about the data that wouldn’t be apparent otherwise. There are many R packages for creating and displaying graphs (igraph is a popular one, and this CRAN task view lists many others) but that’s a problem in its own right: an important part of the data exploration process is trying and comparing different visualization options, and the myriad packages and interfaces makes that process difficult for graph data.

Now, there’s the new ggraph package,  recently published to CRAN by author Thomas Lin Pederson, which promises to make exploring graph data easier. Unlike other graphing packages, ggraph uses the grammar of graphics paradigm of the ggplot2 package, unifying the data structures and attributes associated with graphics. It also includes a wide range of visual representations of graphs — layouts — and makes it easy to switch between them. The basic “mesh” visualization of nodes and edges provides 11 different options for arranging the nodes:


Other types of visualizations are supported, too: hive plots, dendrograms, treemaps, and circle plots, to name just a few. Note that only static graphs are available, though: unlike igraph and some other packages, you can’t rearrange the location of the nodes or otherwise manipulate the graphics with a mouse.

For the R programmer, most of the work is done by the ggraph function. It’s analagous to the ggplot function, except that you don’t provide data for the locations of the nodes; their position is selected by an algorithm. (Similarly, layout choices are automatically made for visualization types other than the mesh.) There are also various themes suited to graphs you can use to style your chart: goodbye gridlines and axes; hello labels, annotations and edge arrows.

The ggraph package is available on CRAN now, and works with R version 2.10 and later. For more on the ggraph package, see the announcement blog post linked below.

Data Imaginist: Announcing ggraph: A grammar of graphics for relational data

소스: ggraph: ggplot for graphs | R-bloggers

관계형 데이터의 그래픽 문법 ggraph 패키지


Announcing ggraph: A grammar of graphics for relational data

FEBRUARY 23, 2017

I am absolutely thrilled to announce that ggraph has finally been released on CRAN. ggraph is my most ambitious package to date and its very early genesis has been described in a prior post. If any mention of ggraph is completely new to you, then in short terms ggraph is an extension of the ggplot2 API to support relational data such as networks and trees. I feel fairly confident in saying that ggraph is the most powerful way to create static network based visualizations in R. Leading up to the release, the three main concepts of ggraph has been described in detail in their own blog posts (layouts, nodes, and edges) so this will not be reiterated here. Instead I’ll talk a bit about the philosophy behind the package as well as show of some of the features that do not fall into any of the three main concepts.

The Philosophy

There is no shortage of software for creating network visualizations and there is no shortage of said visualizations themselves. Often though, the visualizations are more impressive than informative and it is easy to feel that their main task is to show that we are really dealing with some complex data. All of this has led to a certain disdain for classic network visualizations perfectly encapsulated in the nickname hairballs. It does not have to be like this! The greatness of ggplot2 lies in how it allows users to quickly iterate over visualization approaches, thus better ensuring that the best visualization approach is reached. If this was extended to relational data it is my belief that users would be more likely to try to make plots that are more meaningful. After all we all want interpretability, right? Consider having to try out 7 different network visualization packages with different APIs versus just mixing and matching layouts and geoms in an iterative process — I know which way I prefer.

The goal of ggraph is thus clear — provide everything related to visualizations of relational data in a ggplot2-like API to lessen the cognitive load on experimenting with different visual representations. I’m not there yet, but I feel the current version represents a solid foundation where most users will not feel many limitations — on the contrary I believe most users will feel like the chains have come off and they are set free.

Future focus

As I pointed out, ggraph is far from done. I’ll try to keep my development focus in the open by putting things on the road-map as GitHub issues. Honorable mentions include matrix, d3-force and sankey layout, expanded support for edge endings (more choices than grid::arrow()provides), edge routing (avoid node collision), and textbox nodes. I welcome all suggestions as the world of network visualizations is moving fast and I cannot keep on top of everything.

Features besides layouts, nodes, and edges

Understanding the node and edge geoms along with how layouts are defined will get you a long way towards visualizing networks. Still, ggraph has more to offer, some of which will be discussed here:


Consider the following plot:

graph <- graph_from_data_frame(highschool)

p <- ggraph(graph, layout = 'kk') + 
    geom_edge_link(aes(colour = factor(year))) + 
    geom_node_point() + 
    ggtitle('An example')


While the ggplot2 heritage clearly shows due to the grey background with white grid lines, the whole concept of x and y axes is often redundant in network visualizations and are just a distraction. ggraphprovides its own theme optimized for network visualizations called theme_graph(), that facilitates clean and beautiful visualizations:

p + theme_graph()


theme_graph(), besides removing axes, grids, and border, changes the font to Arial Narrow (this can be overridden). Furthermore, it makes it easy to change the coloring of the plot:

p + theme_graph(background = 'grey20', text_colour = 'white')


Adding the same theme to every plot is tedious and ggraph provides a way to avoid this. Using set_graph_style() the theme_graph() is set as default. As an extra benefit all text-based geoms gets their defaults updated so the text automatically uses the same style as the theme.





A powerful but underutilized way of gaining insight into networks is by using small multiples. This technique can reduce edge over-plotting in a very meaningful way by spreading nodes and edges out based on their attributes. The benefits of small multiples are not unique to relational data, as the popularity of ggplot2s facetting functionality shows. The base facetting functions provided by ggplot2 is a bad fit for networks though, as we are working with two very distinct types of data. If you facet on a node attribute, all edges would be plotted in all panels, despite the terminal nodes not being present which is not what you expect. Because of this ggraph comes with its own set of facetting functions tailored to network data:

facet_nodes() and facet_edges()

These two functions are equivalent to facet_wrap() in functionality, but they only address node and edge data respectively. When using facet_nodes() edges are only drawn in a panel if both terminal nodes are present there. When using facet_edges() nodes are always drawn in all panels even if the node data contains an attribute named the same as the one used for the edge facetting.

# Assign each node to a random class
V(graph)$class <- sample(letters[1:4], gorder(graph), TRUE)
# Make year a character
E(graph)$year <- as.character(E(graph)$year)

p <- ggraph(graph, layout = 'kk') + 
    geom_edge_fan(aes(alpha = ..index.., colour = year)) + 
    geom_node_point(aes(shape = class)) + 
    scale_edge_alpha(guide = 'none')

p + facet_edges(~year)


Often, when working with small multiples it is nice to have some visual separation between each plot — setting a foreground color in theme_graph() will add strip background and border (you can also use the th_foreground() helper for this):

p + facet_nodes(~class) + th_foreground(foreground = 'grey80', border = TRUE)


# Lets not have to add this everytime
set_graph_style(foreground = 'grey80')


Facetting on two variables simultaneously is very powerful and something that is supported in ggplot2 with facet_grid(). In ggraphthe same is possible using facet_graph() that takes the behavior of facet_nodes() and facet_edges() and combines them:

p + facet_graph(year ~ class)


As with facet_grid() marginal plots are supported as well:

p + facet_graph(year ~ class, margins = TRUE)


While the default is to put facet the rows on edges and the columns on nodes, this is free to change using the row_type and col_typearguments. There is nothing stopping you from facetting on the same type in each dimension either:

# Facet edge by the class of their start node as well as year
p + facet_graph(year ~ node1.class, col_type = 'edge')


I hope I have convinced you that facetting in the context of relational data is both very easy, as well as extremely powerful. Avoiding the hairball is one of the prime goal of network visualizations and using small multiples is a fantastic way of cutting down on the number of nodes and edges while still getting the full picture.

ggraph, a package for creating network and tree visualizations using the ggplot2 API has just been released on CRAN

소스: Data Imaginist – Announcing ggraph: A grammar of graphics for relational data

R 패키지 소개 ggraph: Nodes


This is the second post in my series of ggraph introductions. The first post introduced the concept of layouts, which is simply a specification on how nodes should be placed on a plane. This post will dive into how the nodes are drawn, once a layout has been calculated.


Nodes in a network are the entities that are connected. Sometimes these are also referred to as vertices, but ggraph has opted for this nomenclature and use it consistently. While the nodes in a graph are the abstract concepts of entities, and the layout is their physical placement, the node geoms is the visual manifestation of the entities. Conceptually one can simply think of it in terms of a scatter plot — the layout provides the x, and y coordinates and these can be used to draw nodes in different ways in the plotting window. Actually, due to the design of ggraph the standard scatterplot-like geoms from ggplot2 can be used directly for plotting nodes:

gr <- graph_from_data_frame(highschool)

ggraph(gr, layout = ‘kk’) +
geom_point(aes(x=x, y=y))

The reason this works is that, as discussed in the previous post, layouts return a data.frame of node positions and metadata and this is used as the default plot data:

head(createLayout(gr, layout = ‘kk’))
#> x y name circular ggraph.index
#> 1 0.2782438 2.4944195 1 FALSE 1
#> 2 0.1365268 3.1063039 2 FALSE 2
#> 3 0.9329938 3.2168940 3 FALSE 3
#> 4 -2.5457734 -1.5139415 4 FALSE 4
#> 5 -2.8447634 -0.2267242 5 FALSE 5
#> 6 -2.9897376 1.7369304 6 FALSE 6

While usage of the default ggplot2 is absolutely allowed, ggraph comes with its own set of node geoms. Many of these are direct translations of ggplot2 own geoms like geom_point() so one could wonder why bother to use them.

The first reason is to provide clear code. It is not apparent anywhere that the standard geoms are addressing the nodes and using geom_node_*() makes it clear that this layer will draw nodes.

The second reason is that it will save typing. Since ggraph are in control of the shape of the input data through the layout calculations, it knows that x and y position is encoded in an x and y column. This means that geom_node_* can default the x and y aesthetics so there’s no need to type them:

ggraph(gr, layout = ‘kk’) +

sometimes there is a need for addressing the x and y aesthetics, which is still possible, for instance if a partition layout should be inverted:

gr <- graph_from_data_frame(flare$edges, vertices = flare$vertices)

ggraph(gr, layout = ‘partition’) +
geom_node_tile(aes(y = -y, fill = depth))

of course this could also be accomplished by reversing the y-axis using scale_y_reverse() so this is just to illustrate that the defaults are easily overwritten if needed.

The third reason is for the added functionality. All ggraph geoms gets a filter aesthetic that allows you to quickly filter the input data. The use of this can be illustrated when plotting a tree:

ggraph(gr, layout = ‘dendrogram’, circular = TRUE) +
geom_edge_diagonal() +
geom_node_point(aes(filter = leaf)) +

In the above plot only the terminal nodes are drawn by filtering on the logical leaf column provided by the dendrogram layout.

The different node geoms

The usual suspects are of course provided in the form of geom_node_point() (showcased above), geom_node_text(), and geom_node_label(). These works as expected, taking in the usual aesthetics (plus filter). Only x and y are defaulted so everything else must be provided e.g. label which does not defaults to the name column like is done in igraph. One feature sets geom_node_text() and geom_node_label() from their ggplot2 counterparts: both have a repel argument that, when set to TRUE will use the repel functionality provided by the ggrepel package to avoid overlapping text.

Apart from these three geoms there’s a set of geoms mainly useful for spatial node layouts such as treemaps, partition, and circle packing. geom_node_tile() is the ggraph counterpart to ggplot2s geom_tile() while geom_node_circle() and geom_node_arc_bar() maps to ggforces geom_circle() and geom_arc_bar(). Collective for these is that the spatial dimensions of the geoms (e.g. radius, width, and height) are precalculated by their intended layouts and defaulted be the geoms:

ggraph(gr, layout = ‘treemap’, weight = ‘size’) +
geom_node_tile(aes(fill = depth))

all spatial node geoms will be center-based, meaning that the x and y value of the layout will refer to the center of the layout and not e.g. the bottom-left corner. This makes it easier to add labels to spatial layouts as well as using spatial layouts in a non-spatial way:

l <- ggraph(gr, layout = ‘partition’, circular = TRUE)
l + geom_node_arc_bar(aes(fill = depth)) +

l + geom_edge_diagonal(aes(width = ..index.., alpha = ..index..), lineend = ’round’) +
scale_edge_width(range = c(0.2, 1.5)) +
geom_node_point(aes(colour = depth)) +

More node geoms are sure to appear in ggraph with time but they will generally be quite easily comprehensible due to their strong assemblance to the standard ggplot2 geoms. After all it is just points on a plane…

More to come

This concludes our tour of the different ways to draw nodes in ggraph. Next up is edges and it is fair to say that this is where it really gets exciting. Stay tuned!

In the second post in this series of ggraph introductions I will dive into how nodes are drawn

소스: Data Imaginist – Introduction to ggraph: Nodes

시각화 패키지 ggraph 소개




In very short terms, a layout is the vertical and horizontal placement of nodes when plotting a particular graph structure. Conversely, a layout algorithm is an algorithm that takes in a graph structure (and potentially some additional parameters) and return the vertical and horizontal position of the nodes. Often, when people think of network visualizations, they think of node-edge diagrams where strongly connected nodes are attempted to be plotted in close proximity. Layouts can be a lot of other things too though — e.g. hive plots and treemaps. One of the driving factors behind ggraph has been to develop an API where any type of visual representation of graph structures is supported. In order to achieve this we first need a flexible way of defining the layout…

ggraph() and createLayout()

As the layout is a global specification of the spatial position of the nodes it spans all layers in the plot and should thus be defined outside of calls to geoms or stats. In ggraph it is often done as part of the plot initialization using ggraph() — a function equivalent in intent to ggplot(). As a minimum ggraph() must be passed a graph object supported by ggraph:


Not specifying the layout – defaults to “auto”

ggraph(graph) +
geom_edge_link(aes(colour = factor(year))) +

Not specifying a layout will make ggraph pick one for you. This is only intended to get quickly up and running. The choice of layout should be deliberate on the part of the user as it will have a great effect on what the end result will communicate. From now on all calls to ggraph() will contain a specification of the layout:

ggraph(graph, layout = ‘kk’) +
geom_edge_link(aes(colour = factor(year))) +

If the layout algorithm accepts additional parameters (most do), they can be supplied in the call to ggraph() as well:

ggraph(graph, layout = ‘kk’, maxiter = 100) +
geom_edge_link(aes(colour = factor(year))) +

In addition to specifying the layout during plot creation it can also happen separately using createLayout(). This function takes the same arguments as ggraph() but returns a layout_ggraph object that can later be used in place of a graph structure in ggraph call:

layout x y name circular ggraph.index
#> 1 -7.734004 10.085789 1 FALSE 1
#> 2 -8.251559 9.226503 2 FALSE 2
#> 3 -7.205127 10.455535 3 FALSE 3
#> 4 -7.113050 11.326465 4 FALSE 4
#> 5 -7.748919 10.742258 5 FALSE 5
#> 6 -7.355531 9.702643 6 FALSE 6
#> $names
#> [1] “x” “y” “name” “circular”
#> [5] “ggraph.index”
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#> [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
#> [47] 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
#> [70] 70
#> $class
#> [1] “layout_igraph” “layout_ggraph” “data.frame”
#> $graph
#> IGRAPH DN– 70 506 —
#> + attr: name (v/c), year (e/n)
#> + edges (vertex names):
#> [1] 1 ->14 1 ->15 1 ->21 1 ->54 1 ->55 2 ->21 2 ->22 3 ->9 3 ->15 4 ->5
#> [11] 4 ->18 4 ->19 4 ->43 5 ->19 5 ->43 6 ->13 6 ->20 6 ->22 7 ->17 8 ->14
#> [21] 8 ->17 9 ->12 9 ->20 9 ->21 9 ->22 9 ->51 11->19 11->50 11->52 11->53
#> [31] 12->20 12->21 12->22 13->17 13->20 13->21 13->22 14->21 14->22 15->20
#> [41] 16->18 16->41 16->43 17->7 17->8 18->11 18->16 18->19 19->4 19->11
#> [51] 19->16 19->18 19->27 20->6 20->12 20->21 20->22 20->38 21->22 21->51
#> [61] 21->54 21->55 22->20 22->21 22->38 22->51 23->40 23->43 23->50 23->52
#> [71] 23->53 23->60 23->62 23->65 23->68 24->51 26->32 26->35 26->36 26->40
#> + … omitted several edges
#> $circular
#> [1] FALSE
As it is just a data.frame it means that any standard ggplot2 call will work by addressing the nodes. Still, use of the geom_node_*() family provided by ggraph is encouraged as it makes it explicit which part of the data structure is being worked with.

Adding support for new data sources

Out of the box ggraph supports dendrogram and igraph objects natively as well as hclust and network through conversion to one of the above. If there is wish for support for additional classes this can be achieved by adding a set of specific methods to the class. The ggraph source code should be your guide in this but I will briefly describe the methods below:


This method is responsible for taking a graph structure and returning a layout_ggraph object. The object is just a data.frame with the correct class and attributes added. The class should be c(‘layout_myclass’, ‘layout_ggraph’, ‘data.frame’) and it should at least have a graph attribute holding the original graph object as well as a circular attribute with a logical giving whether the layout has been transformed to a circular representation or not. If the graph structure contains any additional information about the nodes this should be added to the data.frame as columns so these are accessible during plotting.


This method takes the return value of createLayout.myclass() and returns the edges of the graph structure. The return value should be in the form of an edge list with a to and from column giving the indexes of the terminal nodes of the edge. Furthermore, it must contain a circular column, again indicating whether the layout should be considered circular. If there are any additional data attached to the edges in the graph structure these should be added as columns to the data.frame.


This method is intended to return the shortest path between two nodes as a list of node indexes. This method can be ignored but will result in lack of support for geom_conn_* layers.


Any type of layout algorithm that needs to be available to this class should be defined as a separate layout_myclass_layoutname() function. This function will be called when ‘layoutname’ is used in the layout argument in ggraph() or createLayout(). At a minimum each new class should have a layout_myclass_auto() defined.

Layouts abound

There’s a lot of different layouts in ggraph — first and foremost because igraph implements a lot of layouts for drawing node-edge diagrams and all of these are available in ggraph. Additionally, ggraph provides a lot of new layout types and algorithms for your drawing pleasure.

A note on circularity

Some layouts can be shown effectively both in a standard Cartesian projection as well as in a polar projection. The standard approach in ggplot2 has been to change the coordinate system with the addition of e.g. coord_polar(). This approach — while consistent with the grammar — is not optimal for ggraph as it does not allow layers to decide how to respond to circularity. The prime example of this is trying to draw straight lines in a plot using coord_polar(). Instead circularity is part of the layout specification and gets communicated to the layers with the circular column in the data, allowing each layer to respond appropriately. Sometimes standard and circular representations of the same layout get used so often that they get different names. In ggraph they’ll have the same name and only differ in whether or not circular is set to TRUE:

An arc diagram

ggraph(graph, layout = ‘linear’) +
geom_edge_arc(aes(colour = factor(year)))

A coord diagram

ggraph(graph, layout = ‘linear’, circular = TRUE) +
geom_edge_arc(aes(colour = factor(year)))

graph # An icicle plot
ggraph(graph, ‘partition’) +
geom_node_tile(aes(fill = depth), size = 0.25)

A sunburst plot

ggraph(graph, ‘partition’, circular = TRUE) +
geom_node_arc_bar(aes(fill = depth), size = 0.25)

Not every layout has a meaningful circular representation in which cases the circular argument will be ignored.

Node-edge diagram layouts

igraph provides a total of 13 different layout algorithms for classic node-edge diagrams (colloquially referred to as hairballs). Some of these are incredibly simple such as randomly, grid, circle, and star, while others tries to optimize the position of nodes based on different characteristics of the graph. There is no such thing as “the best layout algorithm” as algorithms have been optimized for different scenarios. Experiment with the choices at hand and remember to take the end result with a grain of salt, as it is just one of a range of possible “optimal node position” results. Below is an animation showing the different results of running all applicable igraph layouts on the highschool graph.

igraph_layouts ‘randomly’, ‘fr’, ‘kk’, ‘drl’, ‘lgl’)
igraph_layouts graph V(graph)$degree layouts layouts_tween statelength = 1, ease = ‘cubic-in-out’,
nframes = length(igraph_layouts) * 16 + 8)
title_transp for (i in seq_len(length(igraph_layouts) * 16)) {
tmp_layout layout title_alpha p geom_edge_fan(aes(alpha = ..index.., colour = factor(year)), n = 15) +
geom_node_point(aes(size = degree)) +
scale_edge_color_brewer(palette = ‘Dark2’) +
ggtitle(paste0(‘Layout: ‘, layout)) +
theme_void() +
theme(legend.position = ‘none’,
plot.title = element_text(colour = alpha(‘black’, title_alpha)))

Hive plots

A hive plot, while still technically a node-edge diagram, is a bit different from the rest as it uses information pertaining to the nodes, rather than the connection information in the graph. This means that hive plots, to a certain extend is more interpretable as well as less vulnerable to small changes in the graph structure. They are less common though, so use will often require some additional explanation.

V(graph)$friends V(graph)$friends = 15, ‘many’, ‘medium’))
ggraph(graph, ‘hive’, axis = ‘friends’, = ‘degree’) +
geom_edge_hive(aes(colour = factor(year), alpha = ..index..)) +
geom_axis_hive(aes(colour = friends), size = 3, label = FALSE) +

Hierarchical layouts

Trees and hierarchies are an important subset of graph structures, and ggraph provides a range of layouts optimized for their visual representation. Some of these uses enclosure and position rather than edges to communicate relations (e.g. treemaps and circle packing). Still, these layouts can just as well be used for drawing edges if you wish to:

graph set.seed(1)
ggraph(graph, ‘circlepack’, weight = ‘size’) +
geom_node_circle(aes(fill = depth), size = 0.25, n = 50) +

ggraph(graph, ‘circlepack’, weight = ‘size’) +
geom_edge_link() +
geom_node_point(aes(colour = depth)) +

ggraph(graph, ‘treemap’, weight = ‘size’) +
geom_node_tile(aes(fill = depth), size = 0.25)

ggraph(graph, ‘treemap’, weight = ‘size’) +
geom_edge_link() +
geom_node_point(aes(colour = depth))

The most recognized tree plot is probably dendrograms though. Both igraph and dendrogram object can be plotted as dendrograms, though only dendrogram objects comes with a build in height information for placing the branch points. For igraph objects this is inferred by the longest ancestral length:

ggraph(graph, ‘dendrogram’) +

dendrogram ggraph(dendrogram, ‘dendrogram’) +

Dendrograms are one of the layouts that are amenable for circular transformations, which can be effective in giving more space at the leafs of the tree at the expense of the space given to the root:

ggraph(dendrogram, ‘dendrogram’, circular = TRUE) +
geom_edge_elbow() +

More to come

This concludes the first of the introduction posts about ggraph. I hope I have been effective in describing the use of layouts and illustrating how they can have a very profound effect on the resulting plot. Stay tuned for more…

In the first post in a series of ggraph introductions I will talk about how ggraph specifies and uses layouts

소스: Data Imaginist – Introduction to ggraph: Layouts