Intro

Central Texas was hit with downpours over the Fourth of July weekend that lead to flash floods. The Guadalupe River near Kerrville went from under 2 feet to over 34 in just over an hour. The death toll is at least 90 as of July 7, most of them in Kerr County.

This is a data analysis of: flash flood fatalities from NOAA and stream gage data from USGS.

Here’s what we found:

Read below to see how we got these numbers.

Methods

We downloaded NOAA storm data through their bulk data download, getting annual files from 1950 through 2024 (StormEvents_details-ftp_v1.0).

Start by listing all the annual files and generating paths to import them.

dir <- "inputs/noaa/storm-events/"
base <- "StormEvents_details-ftp_v1.0_d"
seq <- 1950:2024
end <- "_c20250520.csv"

file_names <- data.frame(year = seq, 
                         file = paste0(dir, base, seq, end)) %>%
  mutate(file = ifelse(year == 2020, "inputs/noaa/storm-events/StormEvents_details-ftp_v1.0_d2020_c20250702.csv", file))

rm(dir, base, seq, end)

Now make a function to import and filter events for flash floods. Iterate it over all the files.

import_storms_fc <- function(path){
  
  x <- read_csv(path, guess_max = 100000) %>% clean_names()
  
  # filter flash floods
  # keep relevant columns
  x <- x %>%
    mutate(state = str_to_title(state)) %>%
    filter(event_type == "Flash Flood") %>%
    select(state, year, injuries_direct, injuries_indirect, deaths_direct, deaths_indirect,
           flood_cause, episode_narrative, event_narrative)
  
  return(x)
  
}

flash_floods <- lapply(file_names$file, import_storms_fc) %>% rbindlist()

Looks like the earliest record of a flash flood in this data is in 1996. Since then, there’s been 1,923 direct deaths and 6,508 direct injuries.

Let’s aggregate this annual data by state.

# between 1996 and 2024
sum(flash_floods$injuries_direct)
## [1] 6508
sum(flash_floods$injuries_indirect)
## [1] 69
sum(flash_floods$deaths_direct)
## [1] 1923
sum(flash_floods$deaths_indirect)
## [1] 65
# group by state and remove territories
flash_floods_state_1996_2024 <- flash_floods %>%
  group_by(state) %>%
  summarise(injuries_direct = sum(injuries_direct, na.rm = T),
            deaths_direct = sum(deaths_direct, na.rm = T)) %>%
  filter(!state %in% c("American Samoa", "Puerto Rico", "Virgin Islands", "Guam"))

# quick chart of top 10 states
flash_floods_state_1996_2024 %>% 
  arrange(desc(deaths_direct)) %>%
  head(10) %>%
  mutate(state = factor(state, levels = rev(unique(state)))) %>%
  ggplot(aes(x = deaths_direct, y = state)) +
  geom_bar(stat = "identity", fill = "#1665CF") +
  theme_linedraw() +
  labs(title = "Top 10 States by Flash Flood Deaths (1996-2024)",
     x = "Direct Deaths",
     y = "")

Texas is the state with the most flash floods deaths. Let’s take a closer look at annual numbers. Between 1996 and 2024, the year with the most deaths is 2017 at 68.

# get an annual timeseries for texas
flash_floods_tx_year <- flash_floods %>%
  filter(state == "Texas") %>%
  group_by(year, state) %>%
  summarise(injuries_direct = sum(injuries_direct, na.rm = T),
            deaths_direct = sum(deaths_direct, na.rm = T))

# quick chart
flash_floods_tx_year %>%
  ggplot(aes(x = year, y = deaths_direct)) +
  geom_bar(stat = "identity", fill = "#1665CF") +
  theme_linedraw() +
  labs(title = "Texas Annual Flash Flood Deaths (1996-2024)",
     y = "Direct Deaths",
     x = "")

Here is the table for state data (combined years).

And the table for Texas annual data (1996-2024).

Now, let’s move to stream gage data from USGS. We’ll be accessing the data through the dataRetrieval package from the agency, but the same can be found on their website.

We’ll start with site 08166200 for Guadalupe River at Kerrville, TX. Gage height continuous data (15-min intervals) is available since mid 2007. Gage height daily data which includes mean/max/min aggregations is available since mid 1997.

Query all daily gage height (parameter 00065) and then label, based on max height, the category for each day:

These thresholds are set by the local NWS office. Corresponding values for our site can be found here.

kerrville <- readNWISdv(siteNumbers = "08166200",
                        parameterCd = "00065",
                        startDate = "1997-01-01",
                        endDate = "2025-07-08",
                        statCd = c("00001", "00003")) # 00001 = Max, 00003 = Mean

# rename columns for clarity
kerrville <- kerrville %>%
  clean_names() %>%
  select(site_no, date, stage_height_max = x_00065_00001, stage_height_mean = x_00065_00003)

# label flooding categories based on daily max
kerrville <- kerrville %>%
  clean_names() %>%
  mutate(category = case_when(
    stage_height_max >= 7 & stage_height_max < 9 ~"Action",
    stage_height_max >= 9 & stage_height_max < 12 ~ "Minor flooding",
    stage_height_max >= 12 & stage_height_max < 20 ~ "Moderate flooding",
    stage_height_max >= 20 ~ "Major flooding"))

Since October 1997, as far as records in this place go, the Guadalupe River at Kerrville reached minor flooding 4 days, moderate flooding 6 days, and major flooding 2 days.

The highest river height recorded there is 34.29 feet, on July 4, 2025.

# get a quick count
kerrville %>%
  group_by(category) %>%
  summarise(count = n())
## # A tibble: 5 × 2
##   category          count
##   <chr>             <int>
## 1 Action                8
## 2 Major flooding        2
## 3 Minor flooding        4
## 4 Moderate flooding     6
## 5 <NA>               9955
# table with full data
kerrville %>%
  arrange(-stage_height_max) %>% # can switch to arrange(desc(date))
  datatable(extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf')))
# export river height for each minor/moderate/major flood
kerrville_flood_heights <- kerrville %>%
  filter(category != "Action") %>%
  select(-stage_height_mean) %>%
  arrange(-stage_height_max)

# write_csv(kerrville_flood_heights, "outputs/kerrville_flood_heights.csv")

Double check July 4 was the highest day by taking a look at the continuous data (15-min intervals) for the site. July 4, 2025 at 11:45 am is indeed the highest recorded in this data.

# this gets subdaily data
kerrville_uv <- readNWISuv(siteNumbers = "08166200",
                        parameterCd = "00065",
                        startDate = "2007-01-01",
                        endDate = "2025-07-07")

# check max height recorded
kerrville_uv %>%
  arrange(-X_00065_00000) %>%
  head(1)
##   agency_cd  site_no            dateTime X_00065_00000 X_00065_00000_cd tz_cd
## 1      USGS 08166200 2025-07-04 11:45:00         34.29                P   UTC

We can also look at the discharge (streamflow) data at this site. Continuous data (15-min intervals) at the Guadalupe River at Kerrville goes back to mid 1996. Daily discharge data which includes mean/max/min aggregations is available since mid 1986.

Query both as far back as they go.

# daily
kerrville_Q <- readNWISdv(siteNumbers = "08166200",
                        parameterCd = "00060",
                        startDate = "1986-01-01",
                        endDate = "2025-07-08",
                        statCd = c("00001", "00003")) # 00001 = Max, 00003 = Mean

# continuous
kerrville_uv_Q <- readNWISuv(siteNumbers = "08166200",
                        parameterCd = "00060",
                        startDate = "1996-01-01",
                        endDate = "2025-07-08")

# rename columns for clarity
kerrville_Q <- kerrville_Q %>%
  clean_names() %>%
  select(site_no, date, q_max = x_00060_00001, q_mean = x_00060_00003)

# and change UTC to central time zone
kerrville_uv_Q <- kerrville_uv_Q %>%
  clean_names() %>%
  select(site_no, date_time, q_cfs = x_00060_00000) %>%
  mutate(date_time = as.POSIXct(date_time, tz = "America/Chicago"))

By the start of July 4, 2025, the discharge at the Guadalupe River at Kerrville was 3.04 cubic feet per second (cfs). That rate would fill an olympic pool (660,000 gallons or 88,229 ft3) in 8.1 hours (88,229 ft3 divided by 3.04 cfs, then divided by 3600 to get rate in hours).

Soon after sunrise (7:30 am), it reached 134,000 cfs. That rate would fill the same pool in 0.66 seconds (88,229 ft3 divided by 134,000 cfs).

That peak discharge was the second highest ever recorded by this stream monitor, with data going back to mid 1986. It’s worth noting that there is a gap in the data between 6:15 am and 7:30 am before discharge starts to decrease on July 4. That means the monitor may have missed even higher values, something that happens during extreme events.

Here is the continuous data showing the quick spike on July 4, 2025:

kerrville_uv_Q %>%
  filter(date_time >= "2025-07-03 23:00 CT") %>%
  arrange(date_time) %>%
  # table displayed is being wonky on timezones, split column instead
  mutate(date = as.Date(date_time),
         time = format(date_time, "%H:%M:%S")) %>%
  select(site_no, date, time, discharge_cfs = q_cfs) %>%
  datatable(extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf')))

Here is the daily data showing the max discharge back until 1986:

kerrville_Q %>%
  arrange(-q_max) %>%
  select(site_no, date, max_discharge_cfs = q_max, mean_discharge_cfs = q_mean) %>%
  datatable(extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf')))

Finally, let’s put together for export all the stream data (height and discharge) from sensors along the Guadalupe for July 4th weekend.

# we will need continous data to recreate the rapid spikes
# let's query sensors in the state and then we can narrow down

# get all sites in Texas with subdaily gage height data
tx_gauges <- whatNWISdata(stateCd = "TX", parameterCd = "00065", service = "uv")

# filter for active gauges along the Guadalupe River
guadalupe_gauges <- tx_gauges %>%
  filter(site_tp_cd == "ST", end_date >= "2025-07-08") %>%
  mutate(guadalupe = str_detect(station_nm, regex("guadalupe", ignore_case = T))) %>%
  filter(guadalupe == T)

# the gauges are in order starting upstream
# for now, use the top 6 which would be upstream of Hunt all the way to Comfort
gauges <- guadalupe_gauges %>% 
  head(6) %>%
  select(site_no, station_nm, lat = dec_lat_va, long = dec_long_va, 
         coord_datum = dec_coord_datum_cd)

# now query stage height and discharge data for these starting July 4 through July 6
stream_data_july_4th <- readNWISuv(siteNumbers = gauges$site_no,
                        parameterCd = c("00065", "00060"),
                        startDate = "2025-07-04",
                        endDate = "2025-07-06")

# rename columns for clarity 
# change UTC to central time zone
# and add station names, lat, long
stream_data_july_4th <- stream_data_july_4th %>%
  clean_names() %>%
  select(site_no, date_time, height_ft = x_00065_00000, discharge_cfs = x_00060_00000) %>%
  mutate(date_time = as.POSIXct(date_time, tz = "America/Chicago")) %>%
  left_join(gauges)

# check number of observations at each
# Hunt is at a 5-min frequency so it has more
# but there's very few at Center Point, we'll remove it before exporting
stream_data_july_4th %>%
  group_by(station_nm) %>%
  summarise(observations = n())
## # A tibble: 6 × 2
##   station_nm                                observations
##   <chr>                                            <int>
## 1 Guadalupe Rv abv Bear Ck at Kerrville, TX          288
## 2 Guadalupe Rv at Comfort, TX                        286
## 3 Guadalupe Rv at Hunt, TX                           445
## 4 Guadalupe Rv at Kerrville, TX                      284
## 5 Guadalupe Rv nr Center Point, TX                    27
## 6 N Fk Guadalupe Rv nr Hunt, TX                      288
stream_data_july_4th <- stream_data_july_4th %>%
  filter(station_nm != "Guadalupe Rv nr Center Point, TX")

# there are gaps in data during peak flood times but it's all the data available
# write_csv(stream_data_july_4th, "outputs/stream_data_july_4th.csv")