시각화가 필요한지 여부를 결정하는 것은 1 단계입니다. 어리석은 것처럼 보일지라도, 이것은 아마도 모든 사람 (나 자신을 포함하여)이 더 자주해야하는 무언가 일 것입니다. 많은 시간 동안, 그것은 당신이하고있는 일의 양을 보여줄 수있는 좋은 방법 인 것처럼 보이지만, 완전히 비효율적이 되어 잠재적으로 당신이하는 일에 해를 끼칠 수 있습니다. 실제로 데이터를 시각화를 해야한다고 판단한 경우 살펴볼 옵션에 대한 대략적인 아이디어가 있어야합니다. 이 게시물은 차트 및 플롯의 일반적인 유형 중 일부를 설명하고 시연합니다.

plot of chunk unnamed-chunk-2

다음은 데이터 시각화에 대한 시리즈 3 부입니다.

실제로 시각화가 필요한지 여부를 결정하십시오.

데이터 시각화 – 1 부에 나와있는 우수 사례와 마찬가지로 먼저 시각화 요소들을 확인하세요.

  • 관련 사항을 명확하게 설명하고 있는가
  • 적절한 잠재 고객에게 맞춤화되어 있는가
  • 프리젠테이션 매체에 맞게 조정 되는가
  • 자료를 염려하는 사람들에게 잊지 못할 추억이되는가
  • 주제에 대한 이해를 높이고 있는가

이것이 가능하지 않으면 데이터 시각화가 필요하지 않을지도 모릅니다.

당신이 필요로 한다면, 좋은 첫 걸음은 무엇 일까요?

발표하고있는 포럼을 한번보세요. 과학 저널을 위해 글을 쓰고 있다면 천 여명의 관객에게 라이브를 발표하는 것과는 다른 것입니다. Journal of Physics와 비교하여 Ted Talk에 대해 생각해보십시오.

요점 : 잠재 고객을 고려하십시오!

고수준 프리젠 테이션에 대해 이야기 해 봅시다. 누구나 0 점을 추가하는 멋진 차트로 슬라이드 쇼를 보았습니다. 그런식으로 선물하는 사람이되지 마십시오! 쓸모없는 콘텐츠를 제공하면 청중을 혼란스럽게하고 지루함을 초래할 수 있습니다.

귀하의 요점은 단일 메트릭의 전년 대비 변화를 표시하는 것입니다. 차트에서 큰 굵은 글꼴로 페이지에 간단한 숫자로 표시하십시오. 이 예제에서는 지난 몇 년 동안 수익을 표시하고 있습니다 (참고 : 어떤 유형의 수익에 대해 좀더 구체적으로 말하십시오).

다음 중 어느 슬라이드를 넣는 것이 더 합리적일까요?

 

plot of chunk unnamed-chunk-3

여러분이 저에게 동의한다면, 오른쪽에있는 것은 프레젠테이션에서 사람들이 이해하기가 훨씬 쉬울 것입니다. 그것은 사람들이 중요한 것에 집중할 수 있도록 처리 할 필요없이 요점을 얻습니다. 지적하고 싶은 추가 너겟을 말할 수 있습니다.

이제 학술적 용도가 아니지만 대중에게 다가 갈 수있는 콘텐츠 (예 : 신문, 잡지, 블로그 등)를 게시하는 방법에 대해 이야기 해 보겠습니다. 이러한 유형의 차트는 광범위한 주제를 다룰 수 있으므로 기본 사항을 고수해야합니다. 우리는 흥미롭고 가치있는 정보를 보여줄 것입니다.

 

다음은 정크 차트의 훌륭한 예입니다. 원래 Daily Kos Article의 저자는 “거짓말 탐지기”차트 유형을 보여줍니다. 차트는 여러 가지 일을 잘 수행합니다. 즉, 관련 요소를 설명하고 잠재 고객과 매체에 적절하며 주제를 더 잘 이해하는 데 도움이됩니다. 그러나 원래 차트는 너무 화려하여 그 효율성에서 벗어납니다. Junk Charts는 색상과 축을 단순화하여 다음 수준으로 끌어 올렸습니다.

원래 버전 (Daily Kos)

plot of chunk unnamed-chunk-4

수정된 버전 (Junk Charts)

plot of chunk unnamed-chunk-5

By merely looking at this chart you can see how it is ranked, a sense of scale, the comparison between people, and clearly labeled names. Fantastic work!

Rather than going over more examples of work others are doing, please visit Chart Porn (don’t worry about the name, it’s a great data visualization site) and Junk Charts. They have phenomenal examples of what to do (and what not to do) when publishing to the public.

You have a point, now what?

There is no rulebook as to how to display your data. However, as you have seen, there are both great and poor options. The choice is up to you – so think long and hard before making a decision (and you can always try a number of them out on people before publishing).

Ask yourself the following questions to help drive your decision:

  • Are you making a comparison?
  • Are you finding a relationship?
  • Are you showing a distribution?
  • Are you finding a trend over time?
  • Are you showing composition?

Once you know which question you are asking, it will keep your mind focused on the outcome and will quickly narrow down your charting options.

Rule of Thumb

  • Trend: Column, Line
  • Comparison: Area, Bar, Bullet, Column, Line, Scatter
  • Relationship: Line, Scatter
  • Distribution: Bar, Boxplot, Column
  • Composition: Donut, Pie, Stacked Bar, Stacked Column

Obviously, there are plenty of choices beyond these, so don’t hesitate to use what works best. I will go over some of these basics and show some comparisons of poor charting techniques vs. slightly better ones.

For this project, I’ll use some oil production data that I found while digging through http://data.world (pretty great site). The data can be found here

Let’s load up some libraries and get started.


Trend – Line Chart

Objective: Visualize a trend in oil production in the US from 1981 – 2016 by year. I want to illustrate the changes over the time period. This is a very high-level view and only shows us a decline followed by a ramp up at the end of the period.

Poor Version

The x-axis is a disaster and the y-axis isn’t formatted well. While it gets the point across, it’s still worthless.

plot of chunk unnamed-chunk-7

Better Version

The title gives us a much better understanding of what we’re looking at. The chart is slightly wider and the axes are formatted to be legible.

plot of chunk unnamed-chunk-8


Comparison – Line Chart

Objective: Identify which states affected the trend the most. Evaluate them simultaneously in order to paint the picture and compare their trends over the time period. From this visual you can see the top states are Alaska, California, Louisiana, Oklahoma, Texas and Wyoming. Texas seems to break the mold quite drastically and drove the spike which occurred after 2010.

Poor Version

There are far too many colors going on here. Everything at the bottom of the chart is relatively useless and takes our focus away from the big players.

plot of chunk unnamed-chunk-9

Better Version

This focuses attention on the top producing states. It compares them to each other and shows the trend per state as well. Using facet_wrap() tends to be used in what’s known as “small multiples” – this is a technique which helps to break up the visual components of the data into easy-to-understand pieces which make intuitive sense.

plot of chunk unnamed-chunk-10


Relationship – Scatter Plot

Objective: Check to see if data from Alaska and California is correlated. While this isn’t extremely interesting, it does allow us to use this same data set (sorry). The charts indicate that there appears to be a strong positive correlation between the two states.

Poor Version

Lots of completely irrelevant data! The size of the point should have nothing to do with the year.

plot of chunk unnamed-chunk-11

Better Version

The points are all the same size and a trend line helps to visualize the relationship. While it can sometimes be misleading, it makes sense with our current data.

plot of chunk unnamed-chunk-12

Distribution – Boxplot

Objective: Examine the range of production by state (per year) to give us an idea of the variance. While the sums and means are nice, it’s quite important to have an idea of distributions. While it was semi-apparent in the line charts, the variance of Texas is huge compared to the others!

Poor Version

Alphabetical order doesn’t add any value, names are overlapping on top of each other. While you can tell who the big players are, this visual does not add the value it should.

plot of chunk unnamed-chunk-13

Better Version

This gives a nice ranking to the plot while still showing their distributions. We could take this a step further and separate out the big players from the small players (I’ll leave that up to you).

plot of chunk unnamed-chunk-14

Composition – Stacked Bar

Objective: Check out the composition of total production by state. It’s interesting to see how the composition was relatively similar across decades until the 2010’s. Texas was 50% of the output!

Poor Version

My favorite, the beautiful pie chart! There’s nothing better than this… (no need for further commentary).

plot of chunk unnamed-chunk-15

Better Version

The 1980’s and 2010’s will be missing years in terms of a “decade” due to the data provided (and it’s only 2017). While the percentage labels are slightly off center, it’s certainly much better than the pie chart. It’s not quite “apples-to-apples” for a comparison because I created different decades, but you get the idea.

I also created an “Other” category in order to simplify the output. When you are doing comparisons, it’s typically a good idea to find a way to reduce the number of variables in the output while not removing data by dropping it completely – do this carefully and transparently!

plot of chunk unnamed-chunk-16

Some other fun concepts are below!

Some of them are nice, others are terrible! I won’t comment on any of them, but I felt it was necessary to include some other ideas I toyed around with.

Have fun with your data visualizations: be creative, think outside the box, use tools other than computers if it makes sense, fail often but learn quickly. I’m sure I’ll think of a thousand better ways to have illustrated the concepts in this post by tomorrow, so I’ll make updates as I think of them!

Now it’s your turn!

As always, the code used in this post is on my GitHub

plot of chunk unnamed-chunk-17

plot of chunk unnamed-chunk-18

plot of chunk unnamed-chunk-19

plot of chunk unnamed-chunk-20

소스: Data Visualization – Part 3 | R-bloggers