Illumined Insights
Posts
Streamline Your Financial Data Analysis with Tidyquant: A Guide to the R Package

Streamline Your Financial Data Analysis with Tidyquant: A Guide to the R Package

Efficient Financial Data Analysis with Tidyquant in R

Stephen Hill
February 14, 2023

This week we turn our attention to another of my favorite R packages. The “tidyquant” package is used to gather financial data in present it in a “tidy” manner. I hope you enjoy this week’s newsletter and I welcome your feedback.

Stephen Hill, Ph.D.

If you’re an R user who’s interested in working with financial data, the “tidyquant” package is an excellent resource. The “tidyquant” package integrates the best features of both the “tidyverse” and the “quantmod” packages, making it easy to access and manipulate financial data. With “tidyquant”, you can quickly import and tidy up data from multiple sources, visualize data and perform basic financial analysis.

https://business-science.github.io/tidyquant/

To get started with the “tidyquant” package, you’ll need to install the package from CRAN (as we do with most R packages). You can do this by running the following code:

install.packages("tidyquant")

Once the package is installed, you can load it into your R session with the library() function:

library(tidyquant)

Make sure that you have also installed (if needed) and loaded the “tidyverse” package.

Now, let’s take a look at some of the features of “tidyquant”. One of the most useful features is the ability to quickly and easily obtain and import financial data from multiple sources. For example, you can import stock data from Yahoo Finance with the “tq_get()” function. In the code below we get historical stock price data for Apple (AAPL) and store the data in an object called “stock_data”:

stock_data = tq_get("AAPL", get="stock.prices")

This function captures AAPL stock price data from 2013 to the present. In R, the first few rows of this data looks like:

First six rows of AAPL stock price data

You can also import economic data from sources such as FRED (Federal Reserve Economic Data) with the “tq_get()” function. Here we get Real Gross Domestic Product (GDP) data:

economic_data = tq_get("GDPC1", get="economic.data")

Once our data (stock or general financial data) is imported, you can manipulate it with the powerful functions from the “tidyverse” package. For example, you can quickly summarize your stock data with the “summarise()” function. Here we calculate the mean open and closing prices for the Apple stock data that we just collected:

stock_data %>% 
  summarise(avg_open = mean(open),
            avg_close = mean(close))

This yields the following summary in R:

Mean daily open and close prices for AAPL

You can also visualize your data with the “ggplot2” package (part of the “tidyverse” package of packages). Here we look at the AAPL stock daily closing price from 2013 to present:

ggplot(stock_data,aes(x = date, y = close)) + 
  geom_line() + theme_bw()

This results in the following plot:

Historical AAPL daily closing prices (2013-present)

We can also use the “tq_transmute()” function to calculate and then display metrics such as annual returns.

stock_data %>% 
  tq_transmute(select      = adjusted, 
                mutate_fun = periodReturn, 
                period     = "yearly", 
                type       = "arithmetic") %>%
  ggplot(aes(x=date, y=yearly.returns*100)) +
  geom_point() +
  geom_line() + 
  geom_hline(yintercept=0) +
  theme_bw() + ylim(-100,100) +
  xlab("Year") + 
  ylab("Annual Returns (%)")

This results in the chart:

AAPL annual returns

We are not restricted to viewing a single stock at a time. We can obtain pricing data from several stocks at once. Here we collect two years of pricing data for SCHD and JEPI (two ETFs known for their dividend payments):

multiple_stocks = tq_get(c("SCHD", "JEPI"),
                      get  = "stock.prices",
                      from = "2021-01-01",
                      to   = "2022-12-31")

We can then plot the pricing data with each ETF's pricing shown in a different color:

ggplot(multiple_stocks,aes(x = date, y = close, color=symbol)) + 
  geom_line() + 
  scale_color_manual(values=c('#e04529', '#298ac6')) +
  theme_bw() + 
  xlab("Date") +
  ylab("Closing Price") +
  labs(color = "Symbol")

Resulting in the plot:

SCHD and JEPI historical pricing (2021-2022)

Lastly, let’s look at how we can add a simple indicator to a chart. We start by gathering stock price data for Live Oak Bank (LOB).

liveoak = tq_get(c("LOB"),
                      get  = "stock.prices",
                      from = "2021-01-01",
                      to   = "2022-12-31")

Let’s plot the LOB closing prices and add the 50 and 200 day simple moving averages. The “tidyquant” package includes a custom “geom” (“geom_ma”) that helps with this.

ggplot(liveoak,aes(x = date, y = close)) + 
  geom_line() + 
  geom_ma(ma_fun = SMA, n = 50, color = "blue") +                 
  geom_ma(ma_fun = SMA, n = 200, color = "red") + 
  theme_bw() + 
  xlab("Date") +
  ylab("Closing Price")

This code results in:

LOB 50 and 200 day simple moving averages

As you can see, the “tidyquant” package makes it easy to access and manipulate financial data in R. We have only scratched the surface with this newsletter article. Whether you’re a beginner or an expert, “tidyquant” has something for everyone! Give it a try today and see how it can help you with your financial data analysis.

Are you interested in learning more about data visualization using R? Click below to get notified about my upcoming book “Data Visualization in R”.

Data Visualization in R

Master the art of data visualization in R with this comprehensive guide to the ggplot2 and esquisse packages. Get hands-on experience with step-by-step examples and practice exercises to become an expert in creating visually appealing and informative graphs.

leanpub.com/datavisualizationinr

Each week we’ll feature a dataset that we find interesting, useful, etc. Last week we looked at the “oldie-but-goodie” Titanic dataset. This week, in honor of the Super Bowl, we suggest the “nflfastR” package for R. The “nflfastR” package is part of a broader set of packages known as the “nflverse”. The “nflfastR” package is used to collect and organize play-by-play data for the National Football League (NFL). This is powerful data that can be used to develop all kinds of analyses (e.g., in-game win probability models).

Explore the “nflfastR” package here:

nflfastR: An R package to quickly obtain NFL play-by-play data

A set of functions to access National Football League play-by-play data.

www.nflfastr.com

Feedback?

Did you enjoy this week’s newsletter? Do you have a topic, tool, or technique that you would like to see featured in a future edition? I’d love to hear from you!

Feedback?

Support this newsletter with a “coffee” (optional, but appreciated).