Global Ensembele Forecast System (GEFS) reanalysis: 2000-2019#
The GEFS version 12 reanalysis is available on Amazon Web Services and can be retrieved with Herbie.
The GEFS directory structure is different than other models that Herbie can access, and that makes Herbie’s access to these files little awkward. Instead of grouping the GRIB fields by forecast hour where there are many different variables for the same lead time in the same file, the GEFS files are grouped into the same variable per file with each GRIB message being a different lead time. This changes the way a user would use Herbie to access GEFS data–a user will need to supply a “variable_level” argument to access a full file. For subsetting by specific grib messages, you will use the “searchString” argument to key in on the message of interest. You will still need to give a value for “fxx” to tell Herbie which directory to look for.
Yeah, it’s a little different paradigm for Herbie, but we can work with it. It may be nice to write a herbie.tool
to help make calls to Herbie a little more simple (like a custom bulk_download script)
[1]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import numpy as np
from herbie.archive import Herbie
# ! The following are imported from
# https://github.com/blaylockbk/Carpenter_Workshop
from paint.standard2 import cm_tmp, cm_wind, cm_wave_height
from toolbox.cartopy_tools import common_features, pc
Retrieve a full file#
Download a full GRIB2 file to your local system.
Remember to specify the following: - fxx
is the lead time lead time, a number between 0 and 384. If you are getting the full file, this only tells Herbie what folder to look in (Days:0-10 or Days:10-16). - member
is the ensemble member. 0 is the control and a value between 1, 2, 3, or 4 is a perturbation member. - variable_level
is the name of the file to obtain.
[2]:
H = Herbie("2017-03-14", model="gefs", fxx=12, member=0, variable_level="tmp_2m")
H.download()
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:287: UserWarning: `product` not specified. Will use ["GEFSv12/reforecast"].
warnings.warn(f'`product` not specified. Will use ["{self.product}"].')
🏋🏻♂️ Found 2017-Mar-14 00:00 UTC F12 [GEFS] [product=GEFSv12/reforecast] GRIB2 file from local and index file from aws.
🌉 Already have local copy --> C:\Users\blayl_depgywe\data\gefs\20170314\tmp_2m_2017031400_c00.grib2
Download/Retrieve a subset#
Subsetting uses the searchString
argument to parse out information from the GRIB2’s index file. Since the variables of each message in a file are all the same, we need to set the searchString
to key in on the lead time we are interested in.
Look at the index file to see how to key in on a specific GRIB message.
[3]:
# Look at the search_this column of the index DataFrame
H.read_idx().search_this
[3]:
grib_message
1.0 :TMP:2 m above ground:3 hour fcst:ENS=low-res ctl
2.0 :TMP:2 m above ground:6 hour fcst:ENS=low-res ctl
3.0 :TMP:2 m above ground:9 hour fcst:ENS=low-res ctl
4.0 :TMP:2 m above ground:12 hour fcst:ENS=low-res...
5.0 :TMP:2 m above ground:15 hour fcst:ENS=low-res...
...
76.0 :TMP:2 m above ground:228 hour fcst:ENS=low-re...
77.0 :TMP:2 m above ground:231 hour fcst:ENS=low-re...
78.0 :TMP:2 m above ground:234 hour fcst:ENS=low-re...
79.0 :TMP:2 m above ground:237 hour fcst:ENS=low-re...
80.0 :TMP:2 m above ground:240 hour fcst:ENS=low-re...
Name: search_this, Length: 80, dtype: object
[4]:
# Get the 15-h forecast
ds = H.xarray(":15 hour fcst:")
[5]:
ds
[5]:
<xarray.Dataset> Dimensions: (latitude: 721, longitude: 1440) Coordinates: number int32 0 time datetime64[ns] 2017-03-14 step timedelta64[ns] 15:00:00 heightAboveGround float64 2.0 * latitude (latitude) float64 90.0 89.75 89.5 ... -89.75 -90.0 * longitude (longitude) float64 0.0 0.25 0.5 ... 359.2 359.5 359.8 valid_time datetime64[ns] 2017-03-14T15:00:00 Data variables: t2m (latitude, longitude) float32 ... gribfile_projection object None Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 2 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: gefs product: GEFSv12/reforecast description: Global Ensemble Forecast System (GEFS) remote_grib: C:\Users\blayl_depgywe\data\gefs\20170314\tmp_2m... local_grib: C:\Users\blayl_depgywe\data\gefs\20170314\tmp_2m... searchString: :15 hour fcst:
Plots#
[6]:
ax = common_features("50m", crs=ds.herbie.crs, figsize=[10, 10]).STATES().BORDERS().ax
p = ax.pcolormesh(
ds.longitude, ds.latitude, ds.t2m, transform=pc, **cm_tmp(units="K").cmap_kwargs
)
plt.colorbar(
p, ax=ax, orientation="horizontal", pad=0.05, **cm_tmp(units="K").cbar_kwargs
)
ax.set_title(
f"{ds.model.upper()}: {H.product_description}\nValid: {ds.valid_time.dt.strftime('%H:%M UTC %d %b %Y').item()}",
loc="left",
)
ax.set_title(ds.t2m.GRIB_name, loc="right")
C:\Users\blayl_depgywe\miniconda3\envs\herbie\lib\site-packages\metpy\xarray.py:353: UserWarning: More than one time coordinate present for variable "gribfile_projection".
warnings.warn('More than one ' + axis + ' coordinate present for variable'
[6]:
Text(1.0, 1.0, '2 metre temperature')

What are valid values for variable_level
?#
You need to look at the file structure within the GEFS bucket to know what is avaialble. We can use s3fs to tell us what we can use for our variable_level
argument.
[7]:
import s3fs
import pandas as pd
[8]:
# List files in the GEFS bucket for a day
fs = s3fs.S3FileSystem(anon=True)
files = fs.ls(
path="noaa-gefs-retrospective/GEFSv12/reforecast/2015/2015010100/c00/Days:1-10"
)
[9]:
# var_lev prefix
var_lev = [i.split("/")[-1].split("_") for i in files if i.endswith(".grib2")]
[10]:
variable_levels_df = pd.DataFrame(var_lev, columns=["variable", "level", "a", "b", "c"])
variable_levels_df
[10]:
variable | level | a | b | c | |
---|---|---|---|---|---|
0 | acpcp | sfc | 2015010100 | c00.grib2 | None |
1 | apcp | sfc | 2015010100 | c00.grib2 | None |
2 | cape | sfc | 2015010100 | c00.grib2 | None |
3 | cin | sfc | 2015010100 | c00.grib2 | None |
4 | dlwrf | sfc | 2015010100 | c00.grib2 | None |
5 | dswrf | sfc | 2015010100 | c00.grib2 | None |
6 | gflux | sfc | 2015010100 | c00.grib2 | None |
7 | gust | sfc | 2015010100 | c00.grib2 | None |
8 | hgt | ceiling | 2015010100 | c00.grib2 | None |
9 | hgt | hybr | 2015010100 | c00.grib2 | None |
10 | hgt | pres | 2015010100 | c00.grib2 | None |
11 | hgt | pres | abv700mb | 2015010100 | c00.grib2 |
12 | hgt | sfc | 2015010100 | c00.grib2 | None |
13 | hlcy | hgt | 2015010100 | c00.grib2 | None |
14 | lhtfl | sfc | 2015010100 | c00.grib2 | None |
15 | ncpcp | sfc | 2015010100 | c00.grib2 | None |
16 | pbl | hgt | 2015010100 | c00.grib2 | None |
17 | pres | hybr | 2015010100 | c00.grib2 | None |
18 | pres | msl | 2015010100 | c00.grib2 | None |
19 | pres | pvor | 2015010100 | c00.grib2 | None |
20 | pres | sfc | 2015010100 | c00.grib2 | None |
21 | pvort | isen | 2015010100 | c00.grib2 | None |
22 | pwat | eatm | 2015010100 | c00.grib2 | None |
23 | rh | hybr | 2015010100 | c00.grib2 | None |
24 | sfcr | sfc | 2015010100 | c00.grib2 | None |
25 | shtfl | sfc | 2015010100 | c00.grib2 | None |
26 | soilw | bgrnd | 2015010100 | c00.grib2 | None |
27 | spfh | 2m | 2015010100 | c00.grib2 | None |
28 | spfh | pres | 2015010100 | c00.grib2 | None |
29 | spfh | pres | abv700mb | 2015010100 | c00.grib2 |
30 | tcdc | eatm | 2015010100 | c00.grib2 | None |
31 | tmax | 2m | 2015010100 | c00.grib2 | None |
32 | tmin | 2m | 2015010100 | c00.grib2 | None |
33 | tmp | 2m | 2015010100 | c00.grib2 | None |
34 | tmp | hybr | 2015010100 | c00.grib2 | None |
35 | tmp | pres | 2015010100 | c00.grib2 | None |
36 | tmp | pres | abv700mb | 2015010100 | c00.grib2 |
37 | tmp | pvor | 2015010100 | c00.grib2 | None |
38 | tmp | sfc | 2015010100 | c00.grib2 | None |
39 | tozne | eatm | 2015010100 | c00.grib2 | None |
40 | tsoil | bgrnd | 2015010100 | c00.grib2 | None |
41 | uflx | sfc | 2015010100 | c00.grib2 | None |
42 | ugrd | hgt | 2015010100 | c00.grib2 | None |
43 | ugrd | hybr | 2015010100 | c00.grib2 | None |
44 | ugrd | pres | 2015010100 | c00.grib2 | None |
45 | ugrd | pres | abv700mb | 2015010100 | c00.grib2 |
46 | ugrd | pvor | 2015010100 | c00.grib2 | None |
47 | ulwrf | sfc | 2015010100 | c00.grib2 | None |
48 | ulwrf | tatm | 2015010100 | c00.grib2 | None |
49 | uswrf | sfc | 2015010100 | c00.grib2 | None |
50 | vflx | sfc | 2015010100 | c00.grib2 | None |
51 | vgrd | hgt | 2015010100 | c00.grib2 | None |
52 | vgrd | hybr | 2015010100 | c00.grib2 | None |
53 | vgrd | pres | 2015010100 | c00.grib2 | None |
54 | vgrd | pres | abv700mb | 2015010100 | c00.grib2 |
55 | vgrd | pvor | 2015010100 | c00.grib2 | None |
56 | vvel | pres | 2015010100 | c00.grib2 | None |
57 | vvel | pres | abv700mb | 2015010100 | c00.grib2 |
58 | watr | sfc | 2015010100 | c00.grib2 | None |
59 | weasd | sfc | 2015010100 | c00.grib2 | None |
[11]:
# These are the available variables
variable_levels_df.variable.unique()
[11]:
array(['acpcp', 'apcp', 'cape', 'cin', 'dlwrf', 'dswrf', 'gflux', 'gust',
'hgt', 'hlcy', 'lhtfl', 'ncpcp', 'pbl', 'pres', 'pvort', 'pwat',
'rh', 'sfcr', 'shtfl', 'soilw', 'spfh', 'tcdc', 'tmax', 'tmin',
'tmp', 'tozne', 'tsoil', 'uflx', 'ugrd', 'ulwrf', 'uswrf', 'vflx',
'vgrd', 'vvel', 'watr', 'weasd'], dtype=object)
[12]:
# These are the available levels
variable_levels_df.level.unique()
[12]:
array(['sfc', 'ceiling', 'hybr', 'pres', 'hgt', 'msl', 'pvor', 'isen',
'eatm', 'bgrnd', '2m', 'tatm'], dtype=object)
[ ]: