Simple introduction to quantmod and tushare

When doing quantitative financial analysis, the first step is to obtain the related data. In the world of R, multiple packages are capable of doing this. In this post, we briefly introduce two packages, quantmod and tushare.

quantmod

The quantmod package is designed to assist the quantitative trader. The information at its official website is somewhat outdated with the news section being staged at 2009. But this package is still under development at Github.

Fetch the data

We can fetch the data of Huatai 300 ETF with

library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
getSymbols("510300.SS", src = "yahoo") # Get the data: 300ETF of Huatai
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "510300.SS"
etf300 <- `510300.SS`    # rename the variable, note the ` sign around the name
rm(`510300.SS`)

In the previous script, the data is fetched from source Yahoo Finance and "510300.SS" is the code name for Huatai 300ETF at yahoo’s website. This function also supports different data source such as "google" for Google Finance, "FRED" for Federal Reserve Economic Data and "oanda" for Oanda.com. Also "csv" and "rda" for loading data from local source. By default, getSymbols will return the results in the name of the symbol, in this case, it’s "510300.SS", which is an unusual R variable name since it starts with a number 5. So we rename it to etf300.

Another way is to use setSymbolLookup to tell quantmod how we want our data to be fetched

setSymbolLookup(ETF300 = list(name = "510300.SS",    # now ETF300 is `510300.SS`
                              src = "yahoo"))

and we can check the settings for this symbol with

getSymbolLookup("ETF300")    # we can check our setting for this symbol
## $ETF300
## $ETF300$name
## [1] "510300.SS"
## 
## $ETF300$src
## [1] "yahoo"

Then we can fetch the data as usual:

getSymbols("ETF300")
## [1] "ETF300"

Save and load the data from local file

One can save the symbol data to local file for easy future access and analysis. This can be done with saveSymbols

saveSymbols("ETF300", file.path = ".")

This command will save the Huatai 300ETF data to a local file called "ETF300.RData" to the current working directory(file.path = "."). Then one can load it via getSymbols by setting src = "RData":

getSymbols("ETF300", src = "RData",    # we can load data from local file by src = "RData"
           dir = "./",                 # directory and file extension of local file
           extension = "RData")        # see documentation of getSymbols.rda()
## [1] "ETF300"

By default, this function will look for a file with extension .rda. We manually set it with extension = "RData".

Visualize the data

One advantage of quantmod is that it is very easy to visualize the data.

chartSeries(ETF300)

Thanks to xts package, it is easy to visualize a subset of the data

chartSeries(ETF300, subset = "2020-12-15::2021-01-18")    # adjust timeline

One can also add technical indicators from the TTR package to existing figures with

addMACD()    # add moving average plot

More examples can be found at quantmod’s example website.

Tushare

In China mainland, it’s somewhat inconvenient to use quantmod due to connection issues with Yahoo Finance and Google Finance. On the other hand, Tushare is a financial data platform mainly focused on China mainland. It’s first released open source in Dec 2014. And now the new version Tushare Pro has been released. Although it’s mainly focused on Python users, an R package tushare is also available at CRAN.

library(Tushare)
## Loading required package: httr
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::first()  masks xts::first()
## x dplyr::lag()    masks stats::lag()
## x dplyr::last()   masks xts::last()
## Loading required package: forecast
## Loading required package: data.table
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
## The following objects are masked from 'package:xts':
## 
##     first, last

A key difference between tushare and quantmod is that in order to acquire Tushare Pro’s personal API token, one has to register an free account. After that, we can initialize tushare API in R with

api <- Tushare::pro_api(token = "PUT YOUR TOKEN HERE")

Then we can query for 300ETF data in such a way

res <- api(api_name = 'fund_daily', ts_code = "510300.SH", start_date = "20200101")
str(res)
## 'data.frame':    271 obs. of  11 variables:
##  $ ts_code   : chr  "510300.SH" "510300.SH" "510300.SH" "510300.SH" ...
##  $ trade_date: chr  "20210210" "20210209" "20210208" "20210205" ...
##  $ pre_close : num  5.69 5.55 5.48 5.47 5.49 ...
##  $ open      : num  5.71 5.58 5.52 5.49 5.46 ...
##  $ high      : num  5.83 5.69 5.57 5.55 5.51 ...
##  $ low       : num  5.7 5.55 5.48 5.48 5.42 ...
##  $ close     : num  5.81 5.69 5.55 5.48 5.47 ...
##  $ change    : num  0.115 0.138 0.072 0.008 -0.011 -0.006 0.08 0.063 -0.037 -0.136 ...
##  $ pct_chg   : num  2.02 2.485 1.313 0.146 -0.201 ...
##  $ vol       : num  3419099 4182998 4750134 5600005 3000842 ...
##  $ amount    : num  1973984 2349538 2628084 3085297 1639395 ...
head(res)
##     ts_code trade_date pre_close  open  high   low close change pct_chg     vol
## 1 510300.SH   20210210     5.692 5.706 5.833 5.697 5.807  0.115  2.0204 3419099
## 2 510300.SH   20210209     5.554 5.575 5.694 5.550 5.692  0.138  2.4847 4182998
## 3 510300.SH   20210208     5.482 5.520 5.573 5.482 5.554  0.072  1.3134 4750134
## 4 510300.SH   20210205     5.474 5.486 5.547 5.476 5.482  0.008  0.1461 5600005
## 5 510300.SH   20210204     5.485 5.460 5.510 5.416 5.474 -0.011 -0.2005 3000842
## 6 510300.SH   20210203     5.491 5.491 5.525 5.464 5.485 -0.006 -0.1093 2819695
##    amount
## 1 1973984
## 2 2349538
## 3 2628084
## 4 3085297
## 5 1639395
## 6 1549207
skimr::skim(res)
Table 1: Data summary
Name res
Number of rows 271
Number of columns 11
_______________________
Column type frequency:
character 2
numeric 9
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ts_code 0 1 9 9 0 1 0
trade_date 0 1 8 8 0 271 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
pre_close 0 1 4.52 0.57 3.53 3.98 4.69 4.92 5.69 ▇▇▇▇▃
open 0 1 4.52 0.57 3.54 3.98 4.70 4.94 5.71 ▇▆▇▇▃
high 0 1 4.56 0.58 3.58 4.01 4.74 4.96 5.83 ▇▅▇▅▃
low 0 1 4.48 0.56 3.49 3.95 4.64 4.89 5.70 ▇▆▇▆▃
close 0 1 4.53 0.57 3.53 3.98 4.70 4.93 5.81 ▇▅▇▅▂
change 0 1 0.01 0.07 -0.34 -0.03 0.00 0.04 0.33 ▁▁▇▂▁
pct_chg 0 1 0.14 1.47 -8.43 -0.61 0.12 0.89 7.34 ▁▁▇▂▁
vol 0 1 4467546.98 1946338.27 1628381.85 3102811.71 3967685.53 5430194.20 14710757.72 ▇▅▁▁▁
amount 0 1 2017224.92 880484.24 647069.16 1384610.85 1825584.80 2418173.43 6024138.89 ▇▇▂▁▁

Details about these API keys can be found at here and here.

NOTE: A free tushare account can only access partial of the tushare database. In order to have higher data accessing privilege, one should try to increase their accounts’ points. New free account gets 100 points, with some additional 20s via finishing user profile pages. Possible ways including contribution to the community such as bug report, feature request and development. Also making donation is another acceptable approach. Details about acquiring points can be found at here. Point requirements for accessing different types of data can be found at here.

Chao Cheng
Chao Cheng
Statistician

My research interests include applied statistics and machine learning.

Related