Querying the Bitcoin blockchain with R

The crypto-currency Bitcoin and the way it generates “trustless trust” is one of the hottest topics when it comes to technological innovations right now. The way Bitcoin transactions always backtrace the whole transaction list since the first discovered block (the Genesis block) does not only work for finance. The first startups such as Blockstream already work on ways how to use this mechanism of “trustless trust” (i.e. you can trust the system without having to trust the participants) on related fields such as corporate equity.

So you could guess that Bitcoin and especially its components the Blockchain and various Sidechains should also be among the most exciting fields for data science and visualization. For the first time, the network of financial transactions many sociologists such as Georg Simmel theorized about becomes visible. Although there are already a lot of technical papers and even some books on the topic, there isn’t much material that allows for a more hands-on approach, especially on how to generate and visualize the transaction networks.

The paper on “Bitcoin Transaction Graph Analysis” by Fleder, Kester and Pillai is especially recommended. It traces the FBI seizure of $28.5M in Bitcoin through a network analysis.

So to get you started with R and the Blockchain, here’s a few lines of code. I used the package “Rbitcoin” by Jan Gorecki.

Here’s our first example, querying the Kraken exchange for the exchange value of Bitcoin vs. EUR:

library(Rbitcoin)
## Loading required package: data.table
## You are currently using Rbitcoin 0.9.2, be aware of the changes coming in the next releases (0.9.3 - github, 0.9.4 - cran). Do not auto update Rbitcoin to 0.9.3 (or later) without testing. For details see github.com/jangorecki/Rbitcoin. This message will be removed in 0.9.5 (or later).
wait <- antiddos(market = 'kraken', antispam_interval = 5, verbose = 1)
market.api.process('kraken',c('BTC','EUR'),'ticker')
##    market base quote           timestamp market_timestamp  last     vwap
## 1: kraken  BTC   EUR 2015-01-02 13:12:03             <NA&gt; 263.2 262.9169
##      volume    ask    bid
## 1: 458.3401 263.38 263.22

The function antiddos makes sure that you’re not overusing the Bitcoin API. A reasonable query interval should be one query every 10s.

Here’s a second example that gives you a time-series of the lastest exchange values:

trades <- market.api.process('kraken',c('BTC','EUR'),'trades')
Rbitcoin.plot(trades, col='blue')

btc_kraken

The last two examples all were based on aggregated values. But the Blockchain API allows to read every single transaction in the history of Bitcoin. Here’s a slightly longer code example on how to query historical transactions for one address and then mapping the connections between all addresses in this strand of the Blockchain. The red dot is the address we were looking at (so you can change the value to one of your own Bitcoin addresses):

wallet <- blockchain.api.process('15Mb2QcgF3XDMeVn6M7oCG6CQLw4mkedDi')
seed <- '1NfRMkhm5vjizzqkp2Qb28N7geRQCa4XqC'
genesis <- '1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa'
singleaddress <- blockchain.api.query(method = 'Single Address', bitcoin_address = seed, limit=100)
txs <- singleaddress$txs

bc <- data.frame()
for (t in txs) {
  hash <- t$hash
  for (inputs in t$inputs) {
    from <- inputs$prev_out$addr
    for (out in t$out) {
      to <- out$addr
      va <- out$value
      bc <- rbind(bc, data.frame(from=from,to=to,value=va, stringsAsFactors=F))
    }
  }
}

After downloading and transforming the blockchain data, we’re now aggregating the resulting transaction table on address level:

library(plyr)
btc <- ddply(bc, c("from", "to"), summarize, value=sum(value))

Finally, we’re using igraph to calculate and draw the resulting network of transactions between addresses:

library(igraph)
btc.net <- graph.data.frame(btc, directed=T)
V(btc.net)$color <- "blue"
V(btc.net)$color[unlist(V(btc.net)$name) == seed] <- "red"
nodes <- unlist(V(btc.net)$name)
E(btc.net)$width <- log(E(btc.net)$value)/10
plot.igraph(btc.net, vertex.size=5, edge.arrow.size=0.1, vertex.label=NA, main=paste("BTC transaction network for\n", seed))

btc_network

Street Fighting Trend Research

One of the most intriguing tools for the Street Fighting Data Science approach is the new Google Trends interface (formerly known as Google Insights for Search). This web application allows to analyze the volume of search requests for particular keywords over time (from 2004 on). This can be very useful for evaluating product life-cycle – assuming a product or brand that is not being searched on Google is no longer relevant. Here’s the result for the most important products in the Samsung Galaxy range:

For the S3 and S4 model the patterns are almost the same:

  • Stage 1: a slow build-up starting in the moment on the product was first mentioned in the Internet
  • Stage 2: a sudden burst at the product launch
  • Stage 3: a plateau phase with additional spikes when product modifications are launched
  • Stage 4: a slow decay of attention when other products have appeared

The S2 on the other hand does not have this sudden burst at launch while the Galaxy Note does not decay yet but displays multiple bursts of attention.

But in South Korea, the cycles seem quite different:

If you take a look at the relative numbers, the Galaxy Note is much stronger in South Korea and at the moment is at no. 1 of the products examined.

An interesting question is: do these patterns also hold for other mobile / smartphone brands? Here’s a look at the iPhone generations as searched for by Google users:

The huge spike at the launch of the iPhone 5 hints at the most successful launch in terms of Google attention. But this doesn’t say anything about the sentiment of the attention. Interestingly enough, the iPhone 5 had a first burst at the same moment the iPhone 4S has been launched. The reason for this anomany: people were expecting that Apple would be launching the iPhone 5 in Sep/Oct 2011 but then were disappointed that the Cupertino launch event was only about a iPhone 4S.

Analyses like this are especially useful at the beginning of a trend research workflow. The next steps would involve digging deeper in the patterns, taking a look at the audiences behind the data, collecting posts, tweets and news articles for the peaks in the timelines, looking for correlations of the timelines in other data sets e.g. with Google Correlate, brand tracking data or consumer surveys.