The United States Geological Survey continuously monitors earth quakes and makes the corresponding data available to the public. A dataset containing all worldwide earth quakes for a time frame of 30 days is available at http://www.hofroe.net/data/earthquakes.csv.
You can find the accompanying codebook at US Geology Survey (you should be able to answer all questions in this exam without the codebook).
eq <- read.csv("http://www.hofroe.net/data/earthquakes.csv")
What is the time frame under consideration?
eq$Date <- lubridate::ymd(eq$Date)
summary(eq$Date)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## "2012-09-05" "2012-09-13" "2012-09-21" "2012-09-20" "2012-09-27" "2012-10-05"
The range of dates considered is between 2012-09-05 and 2012-10-05.
How many earthquakes were there?
nrow(eq)
## [1] 7162
There were 7,162 earthquakes.
When and where was the strongest earthquake?
eq[which.max(eq$Magnitude), c("Date", "Location")]
## Date Location
## 952 2012-09-30 9km WNW of San Agustin
What was its magnitude?
max(eq$Magnitude)
## [1] 7.3
levels(eq$Country)[11] <- "California"
sort(table(eq$Country), decreasing=TRUE)[1:5]
##
## California Alaska British Virgin Islands
## 2957 1907 479
## Nevada Washington
## 242 207
library(ggplot2)
library(dplyr)
library(forcats)
eq %>%
group_by(Country) %>%
tally() %>%
arrange(desc(n)) %>%
slice(1:5)
## # A tibble: 5 x 2
## Country n
## <fct> <int>
## 1 California 2957
## 2 Alaska 1907
## 3 British Virgin Islands 479
## 4 Nevada 242
## 5 Washington 207
eq %>%
group_by(Country) %>%
tally() %>%
arrange(desc(n)) %>%
slice(1:20) %>%
mutate(Country = fct_reorder(Country, n)) %>%
ggplot(aes(x = Country, weight = n)) +
geom_bar() +
coord_flip()
eq %>%
filter(!is.na(Country)) %>%
mutate(
Country10 = forcats::fct_lump(Country, 10),
Country10 = forcats::fct_reorder(Country10, Magnitude, na.rm=TRUE, .fun = median)
) %>%
ggplot(aes(x = Country10, y = Magnitude)) +
geom_boxplot() +
coord_flip()
library(ggplot2)
eq %>% ggplot(aes(x = Magnitude)) + geom_histogram(binwidth = 0.1)
# let's use 4 as the cutoff between 'small' and 'large' earth quakes
eq$size <- c("small", "large")[(eq$Magnitude >= 4)+1]
maps
package and extract a world map (hint: think of map_data
). Plot the world map using a polygon layer. Set the fill color to grey50
. Add a layer of points to the map showing the locations of earthquakes use color to distinguish between small and large earthquakes.Describe what you see.
library(maps)
world <- map_data("world")
worldmap <- world %>% ggplot(aes(x = long, y = lat, group=group)) +
geom_polygon(fill = "grey50")
worldmap + geom_point(aes(x = Longitude, y = Latitude, colour = size, group=1), data = eq)
US has most of small earthquakes
the name of the country/state in which most of them happened that day.
Based on the summary data, draw a single chart that incorporates all of the above information.
eq_stats <- eq %>%
group_by(Date) %>%
summarize(
n = n(),
Magnitude = mean(Magnitude),
Country = names(sort(table(Country), decreasing=TRUE))[1]
)
ggplot(eq_stats, aes(x = Date, y = n, colour=Country, size=Magnitude)) +
geom_point()