비즈니스를 위한 데이터 과학 (DS4B)은 비즈니스 분석의 미래이지만 아직 시작해야 할 부분을 파악하기가 어렵습니다. 마지막으로하고 싶은 일은 잘못된 도구로 시간을 낭비하는 것입니다. 시간을 효과적으로 활용하려면 (1) 작업에 적합한 도구 선택과 (2) 도구를 사용하여 비즈니스 가치를 반환하는 방법을 효율적으로 학습하는 두 가지가 있습니다. 이 기사에서는 첫 번째 부분에 초점을 맞추어 왜 R이 6 가지 점에서 올바른 선택인지 설명합니다. 다음 기사에서는 12주 안에 R을 배우는 두 번째 부분에 초점을 맞 춥니 다.
REASON 1: R HAS THE BEST OVERALL QUALITIES
There are a number of tools available business analysis/business intelligence (with DS4B being a subset of this area). Each tool has its pros and cons, many of which are important in the business context. We can use these attributes to compare how each tool stacks up against the others! We did a qualitative assessment using several criteria:
Business Capability (1 = Low, 10 = High)
Ease of Learning (1 = Difficult, 10 = Easy)
Cost (Free/Minimal, Low, High)
Trend (0 = Fast Decline, 5 = Stable, 10 = Fast Growth)
Further discussion on the assessment is included in the Appendix at the end of the article.
What we saw was particularly interesting. A trendline developed exposing a tradeoff between learning curve and DS4B capability rating. The most flexible tools are more difficult to learn but tend to have higher business capability. Conversely, the “easy-to-learn” tools are often not the best long-term tools for business or data science capability. Our opinion is go for capability over ease of use.
Of the top tools in capability, R has the best mix of desirable attributes including high data science for business capability, low cost, and it’s growing very fast. The only downside is the learning curve. The rest of the article explains why R is so great for business.
REASON 2: R IS DATA SCIENCE FOR NON-COMPUTER SCIENTISTS
If you are seeking high-performance data science tools, you really have two options: R or Python. When starting out, you should pick one. It’s a mistake to try to learn both. Your choice comes down to what’s right for you. The difference between the R and Python has been described in numerous infographics and debates online, but the most overlooked reason is person-programming language fit. Don’t understand what we mean? Let’s break it down.
Fact 1: Most people interested in learning data science for business are not computer scientists.They are business professionals, non-software engineers (e.g. mechanical, chemical), and other technical-to-business converts. This is important because of where each language excels.
Fact 2: Most activities in business and finance involve communication. This comes in the form of reports, dashboards, and interactive web applications that allow decision makers to recognize when things are not going well and to make well-informed decisions that improve the business.
Now that we recognize what’s important, let’s learn about the two major players in data science.
Python is a general service programming language developed by software engineers that has solid programming libraries for math, statistics and machine learning. Python has best-in-class tools for pure machine learning and deep learning, but lacks much of the infrastructure for subjects like econometrics and communication tools such as reporting. Because of this, Python is well-suited for computer scientists and software engineers.
R is a statistical programming language developed by scientists that has open source libraries for statistics, machine learning, and data science. R lends itself well to business because of its depth of topic-specific packages and its communciation infrastructure. R has packages covering a wide range of topics such as econometrics, finance, and time series. R has best-in-class tools for visualization, reporting, and interactivity, which are as important to business as they are to science. Because of this, R is well-suited for scientists, engineers and business professionals.
WHAT SHOULD YOU DO?
Don’t make the decision tougher than what it is. Think about where you are coming from:
Are you a computer scientist or software engineer? If yes, choose Python.
Are you an analytics professional or mechanical/industrial/chemical engineer looking to get into data science? If yes, choose R.
Think about what you are trying to do:
Are you trying to build a self-driving car? If yes, choose Python.
Are you trying to communicate business analytics throughout your organization? If yes, choose R.
REASON 3: LEARNING R IS EASY WITH THE TIDYVERSE
Learning R used to be a major challenge. Base R was a complex and inconsistent programming language. Structure and formality was not the top priority as in other programming languages. This all changed with the “tidyverse”, a set of packages and tools that have a consistently structured programming interface.
When tools such as dplyr and ggplot2 came to fruition, it made the learning curve much easier by providing a consistent and structured approach to working with data. As Hadley Wickham and many others continued to evolve R, the tidyverse came to be, which includes a series of commonly used packages for data manipulation, visualization, iteration, modeling, and communication. The end result is that R is now much easier to learn (we’ll show you in our next article!).
R continues to evolve in a structured manner, with advanced packages that are built on top of the tidyverse infrastructure. A new focus is being placed on modeling and algorithms, which we are excited to see. Further, the tidyverse is being extended to cover topical areas such as text (tidytext) and finance (tidyquant). For newcomers, this should give you confidence in selecting this language. R has a bright future.
REASON 4: R HAS BRAINS, MUSCLE, AND HEART
Saying R is powerful is actually an understatement. From the business context, R is like Excel on steroids! But more important than just muscle is the combination of what R offers: brains, muscle, and heart.
We already talked about the infrastructure, the tidyverse, that enables the ecosystem of applications to be built using a consistent approach. It’s this infrastructure that brings life into your data analysis. The tidyverse enables:
Data manipulation (dplyr, tidyr)
Working with data types (stringr for strings, lubridate for date/datetime, forcats for categorical/factors)
Programming (purrr, tidyeval)
Communication (Rmarkdown, shiny)
REASON 5: R IS BUILT FOR BUSINESS
Two major advantages of R versus every other programming language is that it can produce business-ready reports and machine learning-powered web applications. Neither Python or Tableau or any other tool can currently do this as efficiently as R can. The two capabilities we refer to are rmarkdown for report generation and shiny for interactive web applications.
Rmarkdown is a framework for creating reproducible reports that has since been extended to building blogs, presentations, websites, books, journals, and more. It’s the technology that’s behind this blog, and it allows us to include the code with the text so that anyone can follow the analysis and see the output right with the explanation. What’s really cool is that the technology has evolved so much. Here are a few examples of its capability:
rmarkdown for generating HTML, Word and PDF reports
Shiny is a framework for creating interactive web applications that are powered by R. Shiny is a major consulting area for us as four of five assignments involve building a web application using shiny. It’s not only powerful, it enables non-data scientists to gain the benefit of data science via interactive decision making tools. Here’s an example of a Google Trend app built with shiny.
REASON 6: R COMMUNITY SUPPORT
Being a powerful language alone is not enough. To be successful, a language needs community support. We’ll hit on two ways that R excels in this respects: CRAN and the R Community.
CRAN: COMMUNITY-PROVIDED R PACKAGES
CRAN is like the Apple App store, except everything is free, super useful, and built for R. With over 14,000 packages, it has most everything you can possibly want from machine learning to high-performance computing to finance and econometrics! The task views cover specific areas and are one way to explore R’s offerings. CRAN is community-driven, with top open source authors such as Hadley Wickham and Dirk Eddelbuettel leading the way. Package development is a great way to contribute to the community especially for those looking to showcase their coding skills and give back!
You begin with R because of its capability, you stay with R because of its community. The R Community is the coolest part. It’s tight-knit, opinionated, fun, silly, and highly knowledgeable… all of the things you want in a high performing team.
A really cool thing about R is that many major cities have a meetup nearby. Meetups are exactly what you think: a group of R-users getting together to talk R. They are usually funded by R-Consortium. You can get a full list of meetups here.
R has a wide range of benefits making it our obvious choice for Data Science for Busienss (DS4B). That’s not to say that Python isn’t a good choice as well, but, for the wide-range of needs for business, there’s nothing that compares to R. In this article we saw why R is a great choice. In the next article we’ll show you how to learn R in 12 weeks.
ABOUT BUSINESS SCIENCE
Business Science specializes in “ROI-driven data science”. Our focus is machine learning and data science in business and financial applications. We build web applications and automated reportsto put machine learning in the hands of decision makers. Visit the Business Science or contact usto learn more!
BUSINESS SCIENCE UNIVERSITY
Interested in learning data science for business? Enroll in Business Science University. We’ll teach you how to apply data science and machine learning in real-world business applications. We take you through the entire process of modeling problems, creating interactive data products, and distributing solutions within an organization. We are launching courses in early 2018!
Here’s some additional information on the tool assessment. We have provided the code used to make the visualization, the criteria explanation, and the tool assessment.
Our assessment of the most powerful DS4B tools was based on three criteria:
Business Capability (1 = Low, 10 = High): How well-suited is the tool for use in the business? Does it include features needed for the business including advanced analytics, interactivity, communication, interactivity, and web apps?
Ease of Learning (1 = Difficult, 10 = Easy): How easy is it to pick up? Can you learn it in a week of short courses or will it take a longer time horizon to become proficient?
Cost (Free/Minimal, Low, High): Cost has two undesirable effects. From a first-order perspective, the organization has to spend money. This is not in-and-of-itself undesirable because the software companies can theoretically spend on R&D and other efforts to advance the product. The second-order effect of lowering adoption is much more concerning. High-cost tools tend to have much less discussion in the online world, whereas open source or low-cost tools have great trends.
Trend (0 = Fast Decline, 5 = Stable, 10 = Fast Growth): We used StackOverflow Insights of questions as a proxy for the trend of usage over time. A major assumption is that growing number of Stack Overflow questions is that the usage is also increasing in a similar trend.
DS4B Capability = 10: Has it all. Great data science capability, great visualization libraries, Shiny for interactive web apps, rmarkdown for professional reporting.
Learning Curve = 4: A lot to learn, but learning is getting easier with the tidyverse.
Trend = 10: Stack overflow questions are growing at a very fast pace.
Cost = Low: Free and open source
DS4B Capability = 7: Has great machine learning and deep learning libraries. Can connect to any major database. Communication is limited by flask / Django web applications, which can be difficult to build. Does not have a business reporting infrastructure comparable to rmarkdown.
Learning Curve = 4: A lot to learn, but learning is relatively easy compared to other object oriented programming languages like Java.
Trend = 10: Stack overflow questions are growing at a very fast pace.
Cost = Low: Free and open source
DS4B Capability = 4: Mainly a spreadsheet software but has programming built in with VBA. Difficult to integrate R, but is possible. No data science libraries.
Learning Curve = 10: Relatively easy to become an advanced user.
Trend = 7: Stack overflow questions are growing at a relatively fast pace.
Cost = Low: Comes with Microsoft Office, which most organizations use.
DS4B Capability = 6: Has R integrated, but is very difficult to implement advanced algorithms and not as flexible as R+shiny.
Learning Curve = 7: Very easy to pick up.
Trend = 6: Stack overflow questions are growing at a relatively fast pace.
Cost = Low: Free public version. Enterprise licenses are relatively affordable.
DS4B Capability = 5: Similar to Tableau, but not quite as feature-rich. Can integrate R to some extent.
Learning Curve = 8: Very easy to pick up.
Trend = 6: Expected to have same trend as Tableau.
Cost = Low: Free public version. Licenses are very affordable.
DS4B Capability = 6: Can do a lot with it, but lacks the infrastructure to use for business.
Learning Curve = 2: Matlab is quite difficult to learn.
Trend = 1: Stack overflow growth is declining at a rapid pace.
Cost = High: Matlab licenses are very expensive. Licensing structure does not scale well.
DS4B Capability = 8: Has data science, database connection, business reporting and visualization capabilities. Can also build applications. However, limited by closed-source nature. Does not get latest technologies like tensorflow and H2O.
Learning Curve = 4: Similar to most data science programming languages for the tough stuff. Has a GUI for the easy stuff.
Trend = 3: Stack Overflow growth is declining.
Cost = High: Expensive for licenses. Licensing structure does not scale well.
CODE FOR THE DS4B TOOL ASSESSMENT VISUALIZATION
Applying data science to business and financial analysis
http://the-r.kr/wp-content/uploads/2017/05/THE-R_100x40_Dark.svg00THE-Rhttp://the-r.kr/wp-content/uploads/2017/05/THE-R_100x40_Dark.svgTHE-R2017-12-30 15:48:402017-12-30 15:48:40비지니스를 위한 R을 배우는 6 가지 이유