Thomas is Senior Data Scientist at Pharmalex. He is passionate about the incredible possibility that blockchain technology offers to make the world a better place. You can contact him on Linkedin or Twitter.
Milana is Data Scientist at Pharmalex. She is passionate about the power of analytical tools to discover the truth about the world around us and guide decision making. You can contact her on Linkedin.
A HTML version of this article with high resolution figures and tables is available here. The code used to generate it is available on my Github.
What is the Blockchain: A blockchain is a growing list of records, called blocks, that are linked together using cryptography. It is used for recording transactions, tracking assets, and building trust between participating parties. Primarily known for Bitcoin and cryptocurrencies application, Blockchain is now used in almost all domains, including supply chain, healthcare, logistic, identity management… Some blockchains are public and can be accessed from everyone while some are private. Hundreds of blockchains exist with their own specifications and applications: Bitcoin, Ethereum, Tezos…
What is Helium: Helium is a decentralized wireless infrastructure. It is a blockchain that leverages a decentralized global network of Hotspots. A hotspot is a sort of a modem with an antenna, to provide long-range connectivity (it can reach 200 times farther than conventional Wi-Fi!) between wireless “internet of things” (IoT) devices. These devices can be environmental sensors to monitor air quality or for agricultural purpose, localisation sensors to track bike fleets… Explore the ecosystem here. People are incentivized to install hotspots and become a part of the network by earning Helium tokens, which can be bought and sold like any other cryptocurrency. To learn more about Helium, read this excellent article.
What is R: R language is widely used among statisticians and data miners for developing data analysis software.
This is the third article on a series of articles on interaction with blockchains using R. Part I focused on some basic concepts related to blockchain, including how to read the blockchain data. Part II focused on how to track NFTs data transactions and visualise it. If you haven’t read these articles, I strongly encourage you to do so to get familiar with the tools and terminology we use in this third article: Part I and Part II.
Helium is an amazing project. Unlike traditional blockchain related project, it is not just about finance but it has real-world applications. It is intended to help solves problems outside the crypto world, which is awesome! In the past, deploying a communication infrastructure was only possible for big companies. Thanks to the blockchain, this is now accessible to collectives of individuals.
While a lot of content is available about the coverage aspect of Helium and how to correctly position your antenna to maximize your revenue, little is available about the real use of the network by connected devices. In this article, we attempt to examinate a current snapshot of Helium blockchain by answering the following questions:
We will analyse all historical data since the first block of the blockchain, up to the latest. We will generate some statistics and put emphasis on visualisation. I believe there is nothing better than a good graph to communicate a message
To fetch the data, there are several possibilities:
When you work with big dataset, it can get very slow. Here are two tricks to speed it up a bit:
Work with packages/function adapted to handle large dataset. To read the data, we use here the fread from the data.table package. it is much faster than read.table and takes care of decompressing files automatically. For data management operations, data.table is also much faster than tidyverse but I find the code written with the latter much easier to read. That’s why I use tidy approach unless it struggles and in that situation, we switch to data.table.
Try to keep only the data you need to save memory. Discard any data you won’t use such as columns with unimportant attributes, as well as delete heavy objects you no longer need.
The code below is intended to read chain data about the hotspots and perform some data management. We use the H3 package to convert the Uber’s H3 index into latitude/longitude. H3 is a geospatial indexing system using a hierarchical hexagonal grid. H3 supports sixteen resolutions, and each finer resolution has cells with one seventh the area of the coarser resolution. Helium uses the resolution 8. To give you an idea, with this resolution, the earth is covered by 691,776,122 hexagons (see here).
# Load a few useful packages
library(knitr)
library(tidyverse)
library(data.table)
library(ggplot2)
library(gganimate)
library(hexbin)
library(h3)
library(lubridate)
library(sp)
library(rworldmap)
# Run these two lines prior to loading library(rayshader) to
# send output to RStudio Viewer rather than external X11 window
options(rgl.useNULL = TRUE,
rgl.printRglwidget = TRUE)
library(rgl)
library(rayshader)
### Retrieve info on the hotspots
dataHotspots <- fread(file = "data/gateway_inventory_01266692.csv.gz", select = c("address", "owner", "first_timestamp", "location_hex")) %>%
rename(hotspot = address,
firstDate = first_timestamp) %>%
filter(location_hex != "", # remove hotspots without location
firstDate != as.POSIXct("1970-01-01 00:00:00", tz = "UTC")) %>% # a few hotspots appear to have been installed in 1970, this is obviously a mistake in the data base
mutate(data.frame(h3_to_geo(location_hex)),
hotspot = factor(hotspot),
firstDate = round_date(firstDate, "day"), # resolution up to the day is sufficient
owner = factor(owner)) %>% # get the centres of the given H3 indexes
select(-location_hex)
saveRDS(dataHotspots, "data/dataHotspots.rds")
This is how the hotspot dataset looks like. We have the address of the hotspot, the address of the owner (an owner is a Helium wallet to which several hotspots can be linked), the date the hotspot was first seen on the network and its location on the globe.
dataHotspots <- readRDS("data/dataHotspots.rds")
glimpse(dataHotspots)
## Rows: 621,562
## Columns: 5
## $ hotspot <fct> 11bKAMXCzGdqdLutdoTZfv4hfpneyNnW75mTdmJBHFJLEBp6wvW, 11oTSSF~
## $ owner <fct> 13kZdbjtPAubdaLdmmoVYb4d7yT4Q7AAmWNuJaPFz4bdr4agMtM, 14a9496~
## $ firstDate <dttm> 2021-11-24, 2021-07-23, 2021-07-06, 2021-12-03, 2021-11-09,~
## $ lat <dbl> 33.41169, 39.95760, 43.64406, 38.78957, 43.57758, 39.92243, ~
## $ lng <dbl> 119.9856937, -82.8584133, -79.4161326, -77.1403011, -116.160~
Table 1 shows a few descriptive statistics on the hotspot dataset.
dataHotspots %>%
summarise( `Date range` = paste(min(firstDate), max(firstDate), sep = " - "),
`Duration` = round(max(firstDate) - min(firstDate)),
`Total number of hotspots` = length(levels(hotspot)),
`Total number of owners` = length(levels(owner))) %>%
t() %>%
kable(caption = "Descriptive statistics on the content of the hotspot dataset.")
Date range | 2019-07-31 - 2022-03-15 |
Duration | 958 days |
Total number of hotspots | 621562 |
Total number of owners | 256114 |
The first statistic we calculate aims to characterise how many hotspots people may have. Since there are a lot of owners, showing all the combinations is not possible. Plotting a histogram of the distribution is not an option either as it is super skewed (there is an owner with about 2000 hotspots!). Therefore, we chose here to bin the number of hotspots into categories (Table 2). We see that most owners (about 80%) own only one single hotspot but some own hundreds of hotspots.
dataHotspots %>%
group_by(owner) %>%
summarise(n = n()) %>%
mutate(`Number of hotspots per owner` = cut(n,
breaks = c(1, 2, 3, 4, 5, 9, 50, Inf),
labels = c("1", "2", "3", "4", "5-9", "10-50", ">50"),
include.lowest = TRUE)) %>%
group_by(`Number of hotspots per owner`) %>%
summarise(`Number of owners` = n()) %>%
mutate(`Proportion (%)` = round(`Number of owners`/sum(`Number of owners`)*100,2)) %>%
kable(caption = "Hotspots distribution across owners.")
Number of hotspots per owner | Number of owners | Proportion (%) |
---|---|---|
1 | 209600 | 81.84 |
2 | 16961 | 6.62 |
3 | 8977 | 3.51 |
4 | 5329 | 2.08 |
5-9 | 7965 | 3.11 |
10-50 | 6732 | 2.63 |
>50 | 550 | 0.21 |
There are more than 500k hotspots in the world, that is a lot. These hotspots didn’t appear in one day. In Figure 1, we visualize the growth of the network in terms of how many hotspots were added to the network over time, using a cumulative plot. We see three phases: (1) a slow linear increase, (2) an exponential increase in the middle of 2021 followed by (3) a fast linear increase. In my opinion, the exponential phase could have continued further but has saturated due to the limited hotspot supply that happened because of world chips shortage following the Covid pandemic. To give you an idea, there was a 6 months lag between my hotspot order and its delivery.
nHotspotsPerDate <- dataHotspots %>%
group_by(firstDate) %>%
summarise(count = n())
ggplot(nHotspotsPerDate, aes(x = firstDate, y = cumsum(count))) +
geom_line() +
labs(title = "Growth of the network infrastructure",
y = "Total number of hotspots (cumulative)",
x = "Date") +
scale_y_continuous(labels = function(x) format(x, scientific = FALSE),
breaks = seq(0, 5*10^5, length = 6))
Since we have the geographic information for Helium hotspots, we can visualize where they are located. We start by creating an empty world map on which we overlay the hotspot data. Plotting all the individual hotspots on a map would be too much (there are more than 500k hotspots) - the data would be easier to interpret when summarised. Here, we chose to cluster the hotspots into hexagons using a function found on the web (function here) and then plot them using the geom_hex ggplot2 function (Figure 2).
We can see that most hotspots are located in North America, Europe and Asia, mostly in big cities. There are practically no hotspots in Africa, Russia and very few in South America. Surprisingly, we see a few hotspots in the middle of the ocean. It could be either a data issue or simply cheating: People found ways to increase their rewards by spoofing their hotspot’s location, sadly.
# create an empty world map
world <- map_data("world")
map <- ggplot() +
geom_map(
data = world, map = world,
aes(long, lat, map_id = region)
) +
scale_y_continuous(breaks=NULL) +
scale_x_continuous(breaks=NULL) +
theme(panel.background = element_rect(fill='white', colour='white'))
# bin the hotspots into hexagons
makeHexData <- function(df, nbins, xbnds, ybnds) {
h <- hexbin(df$lng, df$lat, nbins, xbnds = xbnds, ybnds = ybnds, IDs = TRUE)
data.frame(hcell2xy(h),
count = tapply(df$hotspot, h@cID, FUN = function(z) length(z)), # calculate the number of row as the number of transactions
cid = h@cell)
}
# find the bounds for the complete data
xbndsHotspot <- range(dataHotspots$lng)
ybndsHotspot <- range(dataHotspots$lat)
nHotspotsHexbin <- dataHotspots %>%
group_modify(~ makeHexData(.x, nbins = 500,
xbnds = xbndsHotspot,
ybnds = ybndsHotspot))
map +
geom_hex(aes(x = x, y = y, fill = count),
stat = "identity",
data = nHotspotsHexbin) +
scale_fill_distiller(palette = "Spectral", trans = "log10") +
labs(title = "Hotspots localisation in the world",
fill = "Number of hotspots") +
theme(legend.position = "bottom")
In addition to visualisation, it is always useful to provide some numbers. Below we summaries the proportion of hotspot per continent. For this, we leverage the rworldmap package with a custom function from here which maps a longitude/latitude couple into the name of the continent/country it belongs to. Table 3 shows that nearly half the hotspots are located in North America, followed by Europe with 30% and then Asia with 16%. Note the Undefined group which probably refers to hotspots located either in the middle of the ocean or along continent border. Note also the four hotspots in… Antarctica.
# The single argument of the function below, - "points", is a data.frame in which:
# - column 1 contains a hotspot's longitude in degrees
# - column 2 contains a hotspot's latitude in degrees
coords2continent = function(points)
{
countriesSP <- getMap(resolution='low')
# "SpatialPoints" converts points to a SpatialPoints object
pointsSP = SpatialPoints(points, proj4string=CRS(proj4string(countriesSP)))
# use 'over' to get indices of the Polygons object containing each point
indices = over(pointsSP, countriesSP)
return(data.frame(continent = indices$REGION, country = indices$ADMIN))
}
dataHotspots <- dataHotspots %>%
mutate(coords2continent(data.frame(.$lng, .$lat)),
continent = replace_na(as.character(continent), "Undefined"),
continent = factor(continent))
dataHotspots %>%
group_by(continent) %>%
summarise(count = n()) %>%
mutate(percentage = round(count/sum(count)*100,2)) %>%
arrange(desc(count)) %>%
kable(caption = "Hotspots distribution per continent.")
continent | count | percentage |
---|---|---|
North America | 287899 | 46.32 |
Europe | 196797 | 31.66 |
Asia | 99828 | 16.06 |
Undefined | 18968 | 3.05 |
South America | 10496 | 1.69 |
Australia | 5528 | 0.89 |
Africa | 2042 | 0.33 |
Antarctica | 4 | 0.00 |
Now, that we understand how the existing hotspots are distributed on the planet and among owners, next it would be interesting to find out if they are being actively used by connected devices and how often. To answer this question, let us download all the history of data transfer. This is a huge dataset (3GB).
On Helium, you only pay for the data you use. Every 24 bytes sent in an uplink or downlink packet cost 1 Data Credit (DC) = $0.00001. To get an idea of how much the network is used, we can look at it from two perspectives: (1) check the volume of data exchanged and (2) check how often the hotspots have been involved in data transfer with connected devices.
### Retrieve transferred packed dataset (transactions)
listFilesTransactions <- list.files("data/packets", pattern=".csv.gz", recursive = T)
# we specify the columns we want to keep directly in "fread" call to save memory
dataTransactions <- lapply(1:length(listFilesTransactions),function(i){
data <- fread(file = paste0("data/packets/",listFilesTransactions[i]), select = c("block", "transaction_hash", "time", "gateway", "num_dcs"))
return(data)
})
dataTransactions <- dplyr::bind_rows(dataTransactions) %>%
mutate(bytes = 24 * num_dcs, # Every 24 bytes sent in an uplink or downlink packet cost 1 DC = $.00001.
date = as.POSIXct(time, origin = "1970-01-01"),
date = round_date(date, "day"), # reduce the precision of the date to ease the plotting
gateway = factor(gateway)) %>%
select(-time, -num_dcs, -transaction_hash) %>%
rename(hotspot = gateway)
# combine the hotspot to the transaction dataset to include the hotspot location
# inner_join keeps all rows available in both X and Y
dataTransactionsWithLocation <- inner_join(dataTransactions, dataHotspots) %>%
mutate(hotspot = factor(hotspot, levels = levels(dataHotspots$hotspot))) %>% # this is to avoid dropping levels for hotspots not involved in any transaction
select(-owner, -firstDate)
# remove these two big dataset to save memory
rm("dataHotspots")
rm("dataTransactions")
saveRDS(dataTransactionsWithLocation, "data/dataTransactionsWithLocation.rds")
This is how the transaction dataset looks like. For each transaction, we have the block number, the address of the hotspot, the number of bytes transferred, the date, and the location of the hotspot.
dataTransactionsWithLocation <- readRDS("data/dataTransactionsWithLocation.rds")
glimpse(dataTransactionsWithLocation)
## Rows: 76,620,778
## Columns: 8
## $ block <int> 333619, 333669, 333958, 333958, 333958, 333958, 333958, 3339~
## $ hotspot <fct> 11tkAbgqHU2qU7GTiuwjggEDaYsmRDsbPsJjw5ezsu54coQE7Cu, 112DCTV~
## $ bytes <dbl> 24, 120, 216, 264, 4296, 48, 1920, 432, 2088, 29760, 2112, 4~
## $ date <dttm> 2020-05-15, 2020-05-15, 2020-05-15, 2020-05-15, 2020-05-15,~
## $ lat <dbl> 41.41625, 44.73126, 37.80697, 30.15410, 26.02164, 37.79103, ~
## $ lng <dbl> -122.38998, -68.82336, -122.27263, -95.40512, -80.17246, -12~
## $ continent <fct> North America, North America, North America, North America, ~
## $ country <fct> United States of America, United States of America, United S~
Table 4 shows a few descriptive statistics on the transaction dataset as well as the volume of data exchanged so far. Clearly, the amount of data exchanged between hotspots and connected devices is small, this is about as much as the data volume created by my smartphone in recent years. This metric does not seem to be a good indication of the Helium usage. Indeed, the network is not intended to transfer huge volumes of data but rather to transfer data across long distance and for a small price. Below, we will look at the second metric, which is more appropriate in quantifying Helium usage.
Another interesting fact - the first transaction occurred on the 2020-05-15 while the first hotspot appeared on the network on 2019-07-31. It means there had been about 14 months delay between the appearance of the first hotspot and the first transaction being made. There are two reasons: (1) my initial guess - this is because a critical number of hotspots was needed to convince connected device manufacturers to work with the network and (2) data transfer was free in the beginning and DC transactions were only activated in April 2020 (more here).
dataTransactionsWithLocation %>%
summarise( `Date range` = paste(min(date), max(date), sep = " - "),
`Duration` = round(max(date) - min(date)),
`Block range` = paste(min(block), max(block), sep = " - "),
`Number of transactions` = n(),
`Total number of hotspots` = length(levels(hotspot)),
`Number of hotspots involved in at least one transaction` =
length(unique(hotspot)),
`Total data volume exchanged so far` =
paste(round(sum(dataTransactionsWithLocation$bytes) / 1e+12,3), "TB")) %>% # sum and convert byte to terabytes
t() %>%
kable(caption = "Summary statistics on the content of the transaction dataset.")
Date range | 2020-05-15 - 2022-03-14 |
Duration | 668 days |
Block range | 333619 - 1264997 |
Number of transactions | 76620778 |
Total number of hotspots | 621562 |
Number of hotspots involved in at least one transaction | 369946 |
Total data volume exchanged so far | 0.506 TB |
To determine how often the hotspots have been involved in data transfer with connected devices, we can also analyse the total number of transactions. This is another metric of Helium usage. Each data transfer between a hotspot and a connected device corresponds to one transaction on the blockchain and one row in our dataset.
To summarise the evolution of this metric, we calculate the cumulative sum of the number of transactions per date and we then stratify it by continent. Globally, figure 3 is very similar to figure 1 above: a slow linear increase followed by an exponential increase, which is finally followed by a fast linear increase. The only difference is the glitch in November 2021, which is due to a major outage of the blockchain (here). Surprisingly, we see that despite having about 15% of the hotspots, Asia don’t seem to be so active in terms of data transfer in contrast to North America and Europe.
# count the number of transaction per date/continent and calculate a cumulative sum
nTransactionsPerDatePerContinent <- dataTransactionsWithLocation %>%
group_by(continent, date) %>%
summarise(count = n()) %>%
group_by(continent) %>%
arrange(date) %>%
mutate(cumsum = cumsum(count)) %>%
arrange(continent)
ggplot(nTransactionsPerDatePerContinent, aes(x=date, y=cumsum, fill=continent)) +
geom_area() +
labs(title = "Growth of the number of transactions between hotspots and devices",
y = "Number of transactions",
x = "Date") +
scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
This is confirmed by the distribution of the total number of transactions per continent, we see that Asia represents only 3% of the total.
dataTransactionsWithLocation %>%
group_by(continent) %>%
summarise(count = n()) %>%
mutate(percentage = round(count/sum(count)*100,2)) %>%
arrange(desc(count)) %>%
kable(caption = "Distribution of the number of transactions per continent.")
continent | count | percentage |
---|---|---|
Europe | 35312062 | 46.09 |
North America | 34475784 | 45.00 |
Undefined | 3923928 | 5.12 |
Asia | 2331167 | 3.04 |
South America | 303896 | 0.40 |
Australia | 215152 | 0.28 |
Africa | 58788 | 0.08 |
Antarctica | 1 | 0.00 |
We can also look at where the top 10 most active hotspots are located. Note that we shall use the data.table syntax instead of dplyr. As mentioned above, the dplyr syntax is preferred for its readability, in this case it takes only 2 seconds for data.table while dplyr is much slower. We see that the most active hotspot are located in France, US and Canada.
summaryTransactionPerHotspot <- dataTransactionsWithLocation[, .(`number of transactions` = .N),
by = c("hotspot", "country")] %>% # data.table syntax to speedup
arrange(desc(`number of transactions`))
summaryTransactionPerHotspot %>%
slice(1:10) %>%
kable(caption = "Localisation of the top 10 most active hotspots.")
hotspot | country | number of transactions |
---|---|---|
11etKgw9Lb6FndJnU17pKQVtsgbPJRvzE8eHny4J5f78NFvEXUD | France | 14275 |
11aWe6V6HSRpMKL5zHATKscLAfuDJoc3Q3kW82BYGnmnNJnHHXj | United States of America | 12254 |
11QxjZpR4Xbzb6mpjGo1F9mXLzbCNgDyteqjduSqJUmTarWnyx1 | France | 12159 |
112TQVbGWMQDM2TVYAbkPbvSWK9LFApBWCtkLjuuKfd9BBheoMp9 | United States of America | 11726 |
11c4pxUfwby5rtz2PtRm4oxmndc8WAcQg5BxT7CNpU56hHqvp9h | United States of America | 11699 |
112RMSnPo2bpJFdVoZAUxtAEYV9WuMTE6vw4PCgJeSbhuh56fG6G | United States of America | 11076 |
112kk7sLkuPybrPDE4ZPAYcAXzuPZbV3F2MH5adatdGohmRX5zJW | United States of America | 11064 |
112vq9i6viw7TLt5tzDm65k34Q4Lf1rPg3jwgYHd9CVxwadcNW4g | United States of America | 11042 |
11HhXZonK1sxhu6CEuXgqBfzjv2x7L2E81BFVZQjYHuE5pmiHMa | Canada | 10665 |
11o9QZnsx4sivpbm72BQGgzqmBtmVt2bbap2oE8DuzLfDMeL2w5 | United States of America | 10423 |
We can also calculate the proportion of hotspots involved in transactions and the median number of transactions per hotspot.
medianNumberOfTransactions <- median(summaryTransactionPerHotspot$`number of transactions`)
propWith0Transactions <- length(which(table(dataTransactionsWithLocation$hotspot) == 0)) /
length(levels(dataTransactionsWithLocation$hotspot)) * 100
The median number of transactions per hotspot (excluding hotspots which didn’t participate in any transaction) is 42 and 40.48% hotspots did not participate in any transaction so far. We cannot really say that all hotspots are being exploited… Not yet! The network is still in its infancy and has a lot of spare capacity.
Let us again we visualise the number of transactions on the the world map. We bin the data using the same makeHexData function and overlay the map with the number of data transactions. This time, we create a longitudinal animation using the gganimate package (Figure 4). Although direct comparison with figure 3 is difficult since we have here an additional dimension (the color refers to the number of transactions), the message is similar. We see that transactions mainly occur in North America before mid 2020, then followed by a strong wave in Europe and Asia. Barely no transaction have occurred in South America and Africa.
## bin the hotspot into hexagons
# find the bounds for the complete data
xbndsPacket <- range(dataTransactionsWithLocation$lng)
ybndsPacket <- range(dataTransactionsWithLocation$lat)
nTransactionsPerDateHexbin <- dataTransactionsWithLocation %>%
mutate(date = as.Date(round_date(date, "week"))) %>% # let's decrease the resolution to ease plotting
group_by(date) %>%
group_modify(~ makeHexData(.x,
nbins = 500,
xbnds = xbndsPacket,
ybnds = ybndsPacket))
pNumberOfTransactionsAnimated <- map +
geom_hex(aes(x = x, y = y, fill = count),
stat = "identity",
data = nTransactionsPerDateHexbin) +
scale_fill_distiller(palette = "Spectral",
trans = "log10") +
labs(title = "Evolution of the number of transactions",
fill = "Number of transactions") +
theme(legend.position = "bottom")
anim <- pNumberOfTransactionsAnimated +
transition_time(date) +
labs(title = "Date: {frame_time}",
subtitle = 'Frame {frame} of {nframes}')
animate(anim, nframes = length(unique(nTransactionsPerDateHexbin$date)))
To add a bit of visual perspective, we can also turn the plot in 3D using the awesome rayshader package. We shall focus on two countries: (1) US as it is the country with the biggest number of hotspots and transactions and (2) Belgium, which is my home country. As this time around we intend to generate a static plot instead of an animation, we re-bin the data into hexagons. Note that it is possible to animate this 3D plot but it takes a lot of computing time and fine tuning (see this).
Figure 5 shows the US map. We see that transactions are homogeneously distributed across the country although the peaks of activity (note that the legend is logarithmic!) are located around big cities (New York, Los Angeles, San Francisco, Miami).
# get the US map
US <- map_data("usa")
mapUS <- ggplot() +
geom_map(
data = US, map = US,
aes(long, lat, map_id = region)
) +
scale_y_continuous(breaks=NULL) +
scale_x_continuous(breaks=NULL) +
theme(panel.background = element_rect(fill='white', colour='white'))
# filter to keep only US transactions
dataTransactionsWithLocationUS <- dataTransactionsWithLocation %>%
filter(country == "United States of America") %>%
filter(lng > -140) # there are a few hotspots far from the mainland
# find the bounds for the complete data
xbndsPacketUS <- range(dataTransactionsWithLocationUS$lng)
ybndsPacketUS <- range(dataTransactionsWithLocationUS$lat)
# bin onto hexagons
nTransactionsUS <- dataTransactionsWithLocationUS %>%
group_modify(~ makeHexData(.x,
nbins = 250,
xbnds = xbndsPacketUS,
ybnds = ybndsPacketUS))
# generate the plot
pNumberOfTransactionsUS <- mapUS +
geom_hex(aes(x = x, y = y, fill = count),
stat = "identity",
data = nTransactionsUS) +
scale_fill_distiller(palette = "Spectral", trans = "log10") +
labs(title = "Distribution of the transactions in US",
fill = "Number of transactions") +
theme(legend.position = "bottom")
# add the 3D
plot_gg(pNumberOfTransactionsUS,
multicore = TRUE,
width = 8,
height= 8,
zoom = 0.7,
theta = 0,
phi = 70,
raytrace = TRUE)
rgl::rglwidget(width = 1024, # this is to print the widget in the html document
height = 768)