Brian Blaylock
March 14, 2022

Global Ensembele Forecast System (GEFS) reanalysis: 2000-2019#

The GEFS version 12 reanalysis is available on Amazon Web Services and can be retrieved with Herbie.

The GEFS directory structure is different than other models that Herbie can access, and that makes Herbie’s access to these files little awkward. Instead of grouping the GRIB fields by forecast hour where there are many different variables for the same lead time in the same file, the GEFS files are grouped into the same variable per file with each GRIB message being a different lead time. This changes the way a user would use Herbie to access GEFS data–a user will need to supply a “variable_level” argument to access a full file. For subsetting by specific grib messages, you will use the “searchString” argument to key in on the message of interest. You will still need to give a value for “fxx” to tell Herbie which directory to look for.

Yeah, it’s a little different paradigm for Herbie, but we can work with it. It may be nice to write a herbie.tool to help make calls to Herbie a little more simple (like a custom bulk_download script)

[1]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import numpy as np

from herbie.archive import Herbie

# ! The following are imported from
# https://github.com/blaylockbk/Carpenter_Workshop
from paint.standard2 import cm_tmp, cm_wind, cm_wave_height
from toolbox.cartopy_tools import common_features, pc

Retrieve a full file#

Download a full GRIB2 file to your local system.

Remember to specify the following: - fxx is the lead time lead time, a number between 0 and 384. If you are getting the full file, this only tells Herbie what folder to look in (Days:0-10 or Days:10-16). - member is the ensemble member. 0 is the control and a value between 1, 2, 3, or 4 is a perturbation member. - variable_level is the name of the file to obtain.

[2]:
H = Herbie("2017-03-14", model="gefs", fxx=12, member=0, variable_level="tmp_2m")
H.download()
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:287: UserWarning: `product` not specified. Will use ["GEFSv12/reforecast"].
  warnings.warn(f'`product` not specified. Will use ["{self.product}"].')
🏋🏻‍♂️ Found 2017-Mar-14 00:00 UTC F12 [GEFS] [product=GEFSv12/reforecast] GRIB2 file from local and index file from aws.
🌉 Already have local copy --> C:\Users\blayl_depgywe\data\gefs\20170314\tmp_2m_2017031400_c00.grib2

Download/Retrieve a subset#

Subsetting uses the searchString argument to parse out information from the GRIB2’s index file. Since the variables of each message in a file are all the same, we need to set the searchString to key in on the lead time we are interested in.

Look at the index file to see how to key in on a specific GRIB message.

[3]:
# Look at the search_this column of the index DataFrame
H.read_idx().search_this
[3]:
grib_message
1.0     :TMP:2 m above ground:3 hour fcst:ENS=low-res ctl
2.0     :TMP:2 m above ground:6 hour fcst:ENS=low-res ctl
3.0     :TMP:2 m above ground:9 hour fcst:ENS=low-res ctl
4.0     :TMP:2 m above ground:12 hour fcst:ENS=low-res...
5.0     :TMP:2 m above ground:15 hour fcst:ENS=low-res...
                              ...
76.0    :TMP:2 m above ground:228 hour fcst:ENS=low-re...
77.0    :TMP:2 m above ground:231 hour fcst:ENS=low-re...
78.0    :TMP:2 m above ground:234 hour fcst:ENS=low-re...
79.0    :TMP:2 m above ground:237 hour fcst:ENS=low-re...
80.0    :TMP:2 m above ground:240 hour fcst:ENS=low-re...
Name: search_this, Length: 80, dtype: object
[4]:
# Get the 15-h forecast
ds = H.xarray(":15 hour fcst:")
[5]:
ds
[5]:
<xarray.Dataset>
Dimensions:              (latitude: 721, longitude: 1440)
Coordinates:
    number               int32 0
    time                 datetime64[ns] 2017-03-14
    step                 timedelta64[ns] 15:00:00
    heightAboveGround    float64 2.0
  * latitude             (latitude) float64 90.0 89.75 89.5 ... -89.75 -90.0
  * longitude            (longitude) float64 0.0 0.25 0.5 ... 359.2 359.5 359.8
    valid_time           datetime64[ns] 2017-03-14T15:00:00
Data variables:
    t2m                  (latitude, longitude) float32 ...
    gribfile_projection  object None
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          2
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   gefs
    product:                 GEFSv12/reforecast
    description:             Global Ensemble Forecast System (GEFS)
    remote_grib:             C:\Users\blayl_depgywe\data\gefs\20170314\tmp_2m...
    local_grib:              C:\Users\blayl_depgywe\data\gefs\20170314\tmp_2m...
    searchString:            :15 hour fcst:

Plots#

[6]:
ax = common_features("50m", crs=ds.herbie.crs, figsize=[10, 10]).STATES().BORDERS().ax
p = ax.pcolormesh(
    ds.longitude, ds.latitude, ds.t2m, transform=pc, **cm_tmp(units="K").cmap_kwargs
)
plt.colorbar(
    p, ax=ax, orientation="horizontal", pad=0.05, **cm_tmp(units="K").cbar_kwargs
)

ax.set_title(
    f"{ds.model.upper()}: {H.product_description}\nValid: {ds.valid_time.dt.strftime('%H:%M UTC %d %b %Y').item()}",
    loc="left",
)
ax.set_title(ds.t2m.GRIB_name, loc="right")
C:\Users\blayl_depgywe\miniconda3\envs\herbie\lib\site-packages\metpy\xarray.py:353: UserWarning: More than one time coordinate present for variable "gribfile_projection".
  warnings.warn('More than one ' + axis + ' coordinate present for variable'
[6]:
Text(1.0, 1.0, '2 metre temperature')
../../_images/user_guide_notebooks_data_gefs_9_2.png

What are valid values for variable_level?#

You need to look at the file structure within the GEFS bucket to know what is avaialble. We can use s3fs to tell us what we can use for our variable_level argument.

[7]:
import s3fs
import pandas as pd
[8]:
# List files in the GEFS bucket for a day
fs = s3fs.S3FileSystem(anon=True)
files = fs.ls(
    path="noaa-gefs-retrospective/GEFSv12/reforecast/2015/2015010100/c00/Days:1-10"
)
[9]:
# var_lev prefix
var_lev = [i.split("/")[-1].split("_") for i in files if i.endswith(".grib2")]
[10]:
variable_levels_df = pd.DataFrame(var_lev, columns=["variable", "level", "a", "b", "c"])
variable_levels_df
[10]:
variable level a b c
0 acpcp sfc 2015010100 c00.grib2 None
1 apcp sfc 2015010100 c00.grib2 None
2 cape sfc 2015010100 c00.grib2 None
3 cin sfc 2015010100 c00.grib2 None
4 dlwrf sfc 2015010100 c00.grib2 None
5 dswrf sfc 2015010100 c00.grib2 None
6 gflux sfc 2015010100 c00.grib2 None
7 gust sfc 2015010100 c00.grib2 None
8 hgt ceiling 2015010100 c00.grib2 None
9 hgt hybr 2015010100 c00.grib2 None
10 hgt pres 2015010100 c00.grib2 None
11 hgt pres abv700mb 2015010100 c00.grib2
12 hgt sfc 2015010100 c00.grib2 None
13 hlcy hgt 2015010100 c00.grib2 None
14 lhtfl sfc 2015010100 c00.grib2 None
15 ncpcp sfc 2015010100 c00.grib2 None
16 pbl hgt 2015010100 c00.grib2 None
17 pres hybr 2015010100 c00.grib2 None
18 pres msl 2015010100 c00.grib2 None
19 pres pvor 2015010100 c00.grib2 None
20 pres sfc 2015010100 c00.grib2 None
21 pvort isen 2015010100 c00.grib2 None
22 pwat eatm 2015010100 c00.grib2 None
23 rh hybr 2015010100 c00.grib2 None
24 sfcr sfc 2015010100 c00.grib2 None
25 shtfl sfc 2015010100 c00.grib2 None
26 soilw bgrnd 2015010100 c00.grib2 None
27 spfh 2m 2015010100 c00.grib2 None
28 spfh pres 2015010100 c00.grib2 None
29 spfh pres abv700mb 2015010100 c00.grib2
30 tcdc eatm 2015010100 c00.grib2 None
31 tmax 2m 2015010100 c00.grib2 None
32 tmin 2m 2015010100 c00.grib2 None
33 tmp 2m 2015010100 c00.grib2 None
34 tmp hybr 2015010100 c00.grib2 None
35 tmp pres 2015010100 c00.grib2 None
36 tmp pres abv700mb 2015010100 c00.grib2
37 tmp pvor 2015010100 c00.grib2 None
38 tmp sfc 2015010100 c00.grib2 None
39 tozne eatm 2015010100 c00.grib2 None
40 tsoil bgrnd 2015010100 c00.grib2 None
41 uflx sfc 2015010100 c00.grib2 None
42 ugrd hgt 2015010100 c00.grib2 None
43 ugrd hybr 2015010100 c00.grib2 None
44 ugrd pres 2015010100 c00.grib2 None
45 ugrd pres abv700mb 2015010100 c00.grib2
46 ugrd pvor 2015010100 c00.grib2 None
47 ulwrf sfc 2015010100 c00.grib2 None
48 ulwrf tatm 2015010100 c00.grib2 None
49 uswrf sfc 2015010100 c00.grib2 None
50 vflx sfc 2015010100 c00.grib2 None
51 vgrd hgt 2015010100 c00.grib2 None
52 vgrd hybr 2015010100 c00.grib2 None
53 vgrd pres 2015010100 c00.grib2 None
54 vgrd pres abv700mb 2015010100 c00.grib2
55 vgrd pvor 2015010100 c00.grib2 None
56 vvel pres 2015010100 c00.grib2 None
57 vvel pres abv700mb 2015010100 c00.grib2
58 watr sfc 2015010100 c00.grib2 None
59 weasd sfc 2015010100 c00.grib2 None
[11]:
# These are the available variables
variable_levels_df.variable.unique()
[11]:
array(['acpcp', 'apcp', 'cape', 'cin', 'dlwrf', 'dswrf', 'gflux', 'gust',
       'hgt', 'hlcy', 'lhtfl', 'ncpcp', 'pbl', 'pres', 'pvort', 'pwat',
       'rh', 'sfcr', 'shtfl', 'soilw', 'spfh', 'tcdc', 'tmax', 'tmin',
       'tmp', 'tozne', 'tsoil', 'uflx', 'ugrd', 'ulwrf', 'uswrf', 'vflx',
       'vgrd', 'vvel', 'watr', 'weasd'], dtype=object)
[12]:
# These are the available levels
variable_levels_df.level.unique()
[12]:
array(['sfc', 'ceiling', 'hybr', 'pres', 'hgt', 'msl', 'pvor', 'isen',
       'eatm', 'bgrnd', '2m', 'tatm'], dtype=object)
[ ]: