Quick Tutorial#

There are mainly two methods for using Herbie 1. When working with one file at a time, you should use the Herbie class imported from herbie.archive to create Herbie objects. 1. When working with many files at a time across different dates and forecast lead times, there are some helper functions in herbie.tools.

Creating a Herbie Object#

The Herbie class gives you the details about an single GRIB2 file with methods to download the file, open with xarray, and subset the file by variable.

What does this class do? When you specify a datetime, model type, and forecast lead time, Herbie will search the different archive sources for the file you are requesting. By default, it searches for the HRRR model (model='hrrr') surface fields (product='sfc') for the zero-hour lead time (fxx=0').

[1]:
from herbie.archive import Herbie
[2]:
help(Herbie)
Help on class Herbie in module herbie.archive:

class Herbie(builtins.object)
 |  Herbie(date=None, *, valid_date=None, model='hrrr', fxx=0, product=None, priority=None, save_dir=WindowsPath('C:/Users/blayl_depgywe/data'), overwrite=False, verbose=True, **kwargs)
 |
 |  Locate GRIB2 file at one of the archive sources.
 |
 |  Parameters
 |  ----------
 |  date : pandas-parsable datetime
 |      *Model initialization datetime*.
 |      If None, then must set ``valid_date``.
 |  valid_date : pandas-parsable datetime
 |      Model valid datetime. Must set when ``date`` is None.
 |  fxx : int
 |      Forecast lead time in hours. Available lead times depend on
 |      the model type and model version. Range is model and run
 |      dependant.
 |  model : {'hrrr', 'hrrrak', 'rap', 'gfs', 'gfs_wave', 'ecmwf', 'rrfs', etc.}
 |      Model name as defined in the models template folder. CASE INSENSITIVE
 |      Some examples:
 |      - ``'hrrr'`` HRRR contiguous United States model
 |      - ``'hrrrak'`` HRRR Alaska model (alias ``'alaska'``)
 |      - ``'rap'`` RAP model
 |      - ``'ecmwf'`` ECMWF open data forecat products
 |  product : {'sfc', 'prs', 'nat', 'subh'}
 |      Output variable product file type. If not specified, will
 |      use first product in model template file. CASE SENSITIVE.
 |      For example, the HRRR model has these products:
 |      - ``'sfc'`` surface fields
 |      - ``'prs'`` pressure fields
 |      - ``'nat'`` native fields
 |      - ``'subh'`` subhourly fields
 |  member : None or int
 |      Some ensemble models (e.g. the future RRFS) will need to
 |      specify an ensemble member.
 |  priority : list or str
 |      List of model sources to get the data in the order of
 |      download priority. CASE INSENSITIVE. Some example data
 |      sources and the default priority order are listed below.
 |      - ``'aws'`` Amazon Web Services (Big Data Program)
 |      - ``'nomads'`` NOAA's NOMADS server
 |      - ``'google'`` Google Cloud Platform (Big Data Program)
 |      - ``'azure'`` Microsoft Azure (Big Data Program)
 |      - ``'pando'`` University of Utah Pando Archive (gateway 1)
 |      - ``'pando2'`` University of Utah Pando Archive (gateway 2)
 |  save_dir : str or pathlib.Path
 |      Location to save GRIB2 files locally. Default save directory
 |      is set in ``~/.config/herbie/config.cfg``.
 |  Overwrite : bool
 |      If True, look for GRIB2 files even if local copy exists.
 |      If False (default), use the local copy (still need to find
 |      the idx file).
 |  **kwargs
 |      Any other paremeter needed to satisfy the conditions in the
 |      model template file (e.g., nest=2, other_label='run2')
 |
 |  Methods defined here:
 |
 |  __init__(self, date=None, *, valid_date=None, model='hrrr', fxx=0, product=None, priority=None, save_dir=WindowsPath('C:/Users/blayl_depgywe/data'), overwrite=False, verbose=True, **kwargs)
 |      Specify model output and find GRIB2 file at one of the sources.
 |
 |  __repr__(self)
 |      Representation in Notebook
 |
 |  __str__(self)
 |      When Herbie class object is printed, print all properties
 |
 |  download(self, searchString=None, *, source=None, save_dir=None, overwrite=None, verbose=None, errors='warn')
 |      Download file from source.
 |
 |      Subsetting by variable follows the same principles described here:
 |      https://www.cpc.ncep.noaa.gov/products/wesley/fast_downloading_grib.html
 |
 |      Parameters
 |      ----------
 |      searchString : str
 |          If None, download the full file. Else, use regex to subset
 |          the file by specific variables and levels.
 |          .. include:: ../../user_guide/searchString.rst
 |      source : {'nomads', 'aws', 'google', 'azure', 'pando', 'pando2'}
 |          If None, download GRIB2 file from self.grib2 which is
 |          the first location the GRIB2 file was found from the
 |          priority lists when this class was initialized. Else, you
 |          may specify the source to force downloading it from a
 |          different location.
 |      save_dir : str or pathlib.Path
 |          Location to save the model output files.
 |          If None, uses the default or path specified in __init__.
 |          Else, changes the path files are saved.
 |      overwrite : bool
 |          If True, overwrite existing files. Default will skip
 |          downloading if the full file exists. Not applicable when
 |          when searchString is not None because file subsets might
 |          be unique.
 |      errors : {'warn', 'raise'}
 |          When an error occurs, send a warning or raise a value error.
 |
 |  find_grib(self, overwrite=False)
 |      Find a GRIB file from the archive sources
 |
 |      Returns
 |      -------
 |      1) The URL or pathlib.Path to the GRIB2 files that exists
 |      2) The source of the GRIB2 file
 |
 |  find_idx(self)
 |      Find an index file for the GRIB file
 |
 |  get_localFilePath(self, searchString=None)
 |      Get path to local file
 |
 |  index_as_dataframe = <functools.cached_property object>
 |      Read and cache the full index file
 |
 |  read_idx(self, searchString=None)
 |      Inspect the GRIB2 file contents by reading the index file.
 |
 |      This reads index files created with the wgrib2 utility.
 |
 |      Parameters
 |      ----------
 |      searchString : str
 |          Filter dataframe by a searchString regular expression.
 |          Searches for strings in the index file lines, specifically
 |          the variable, level, and forecast_time columns.
 |          Execute ``_searchString_help()`` for examples of a good
 |          searchString.
 |
 |          .. include:: ../../user_guide/searchString.rst
 |
 |      Returns
 |      -------
 |      A Pandas DataFrame of the index file.
 |
 |  xarray(self, searchString=None, backend_kwargs={}, remove_grib=True, **download_kwargs)
 |      Open GRIB2 data as xarray DataSet
 |
 |      Parameters
 |      ----------
 |      searchString : str
 |          Variables to read into xarray Dataset
 |      remove_grib : bool
 |          If True, grib file will be removed ONLY IF it didn't exist
 |          before we downloaded it.
 |
 |  ----------------------------------------------------------------------
 |  Readonly properties defined here:
 |
 |  __logo__
 |      For Fun, show the Herbie Logo
 |
 |  get_localFileName
 |      Predict Local File Name
 |
 |  get_remoteFileName
 |      Predict Remote File Name
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)

[3]:
H = Herbie("2022-4-23 00:00")
πŸ‹πŸ»β€β™‚οΈ Found 2022-Apr-23 00:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.

The Herbie object tells us a file matching our request was found on Amazon Web Services (AWS).

We can display some of details from the Herbie object by printing it.

[4]:
print(H)


                                             
   β–ˆ β–ˆβ–ˆ                                     
   β–ˆ β–ˆβ–ˆ ┏━┓ ┏━┓            ┏━┓   ┏━┓        
   β–ˆ β–ˆβ–ˆ ┃ ┃ ┃ ┃┏━━━━┓┏━┓┏━┓┃ ┃   ┏━┓┏━━━━┓  
   β–ˆ β–ˆβ–ˆ ┃ ┗━┛ ┃┃ ━━ ┃┃ ┏━━┛┃ ┗━━┓┃ ┃┃ ━━ ┃  
   β–ˆ β–ˆβ–ˆ ┃ ┏━┓ ┃┃ ━━━┓┃ ┃   ┃ ━━ ┃┃ ┃┃ ━━━┓  
   β–ˆ β–ˆβ–ˆ ┗━┛ ┗━┛┗━━━━┛┗━┛   ┗━━━━┛┗━┛┗━━━━┛  
   β–ˆ β–ˆβ–ˆ                                     
           🏁 Retrieve NWP Model Data 🏁     
                                             


self.DESCRIPTION=High-Resolution Rapid Refresh - CONUS
self.DETAILS={'NOMADS product description': 'https://www.nco.ncep.noaa.gov/pmb/products/hrrr/', 'University of Utah HRRR archive': 'http://hrrr.chpc.utah.edu/'}
self.EXPECT_IDX_FILE=remote
self.IDX_STYLE=wgrib2
self.LOCALFILE=hrrr.t00z.wrfsfcf00.grib2
self.PRODUCTS={'sfc': '2D surface level fields; 3-km resolution', 'prs': '3D pressure level fields; 3-km resolution', 'nat': 'Native level fields; 3-km resolution', 'subh': 'Subhourly grids; 3-km resolution'}
self.SOURCES={'aws': 'https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2', 'nomads': 'https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2', 'google': 'https://storage.googleapis.com/high-resolution-rapid-refresh/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2', 'azure': 'https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2', 'pando': 'https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20220423/hrrr.t00z.wrfsfcf00.grib2', 'pando2': 'https://pando-rgw02.chpc.utah.edu/hrrr/sfc/20220423/hrrr.t00z.wrfsfcf00.grib2'}
self.fxx=0
self.get_localFileName=hrrr.t00z.wrfsfcf00.grib2
self.get_remoteFileName=hrrr.t00z.wrfsfcf00.grib2
self.grib=https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2
self.grib_source=aws
self.idx=https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2.idx
self.idx_source=aws
self.model=hrrr
self.overwrite=False
self.product=sfc
self.product_description=2D surface level fields; 3-km resolution
self.searchString_help=
Use regular expression to search for lines in the index file.
Here are some examples you can use for the wgrib2-style `searchString`

    ============================= ===============================================
    ``searchString=``             Messages that will be downloaded
    ============================= ===============================================
    ":TMP:2 m"                    Temperature at 2 m.
    ":TMP:"                       Temperature fields at all levels.
    ":UGRD:.* mb"                 U Wind at all pressure levels.
    ":500 mb:"                    All variables on the 500 mb level.
    ":APCP:"                      All accumulated precipitation fields.
    ":APCP:surface:0-[1-9]*"      Accumulated precip since initialization time
    ":APCP:surface:[1-9]*-[1-9]*" Accumulated precip over last hour
    ":UGRD:10 m"                  U wind component at 10 meters.
    ":(U|V)GRD:(10|80) m"         U and V wind component at 10 and 80 m.
    ":(U|V)GRD:"                  U and V wind component at all levels.
    ":(?:U|V)GRD:[0-9]+ hybrid"   U and V wind components at all hybrid levels
    ":(?:U|V)GRD:[0-9]+ mb"        U and V wind components at all pressure levels
    ":.GRD:"                      (Same as above)
    ":(TMP|DPT):"                 Temperature and Dew Point for all levels .
    ":(TMP|DPT|RH):"              TMP, DPT, and Relative Humidity for all levels.
    ":REFC:"                      Composite Reflectivity
    ":surface:"                   All variables at the surface.
    ============================= ===============================================

If you need help with regular expression, search the web or look at
this cheatsheet: https://www.petefreitag.com/cheatsheets/regex/.

self.verbose=True

Now lets look at the GRIB2 and index file URLs.

[5]:
print(H.grib)
print(H.idx)
https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2
https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2.idx

Generally, you will only need to search for files using the default source priority order. But you can change the priority order if you wish.

[6]:
# Specify the source priority to only look on Pando
H = Herbie("2022-1-5", priority="pando")
πŸ‹πŸ»β€β™‚οΈ Found 2022-Jan-05 00:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from pando and index file from pando.
[7]:
# Specify the source priority to only look on NOMADS
H = Herbie("2022-1-5", priority="nomads")
πŸ’” Did not find a GRIB2 or Index File for 2022-Jan-05 00:00 UTC F00 HRRR

It doesn’t look like the file was found on the NOMADS server. We can tell Herbie to look at AWS after looking at NOMADS

[8]:
# Specify the source priority.
H = Herbie("2021-5-5", priority=["nomads", "aws"])
πŸ‹πŸ»β€β™‚οΈ Found 2021-May-05 00:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.

Ok, lets ask for the 15-hour forecast from our requested datetime

[9]:
H = Herbie("2021-5-5", fxx=15)
πŸ‹πŸ»β€β™‚οΈ Found 2021-May-05 00:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.

We can also tell Herbie that the datetime we are requesting is the valid time. Herbie will adjust the model run time by the lead time requested.

[10]:
H = Herbie(valid_date="2021-5-5", fxx=15)
πŸ‹πŸ»β€β™‚οΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from local and index file from aws.

Download a Full File#

If the file exists at one of the source locations, Herbie can download the full file to your local drive.

[14]:
H = Herbie(valid_date="2021-5-5", fxx=15)
H.download(verbose=True)
πŸ‹πŸ»β€β™‚οΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
πŸ‘¨πŸ»β€πŸ­ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20210504]
βœ… Success! Downloaded HRRR from aws                 
        src: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20210504/conus/hrrr.t09z.wrfsfcf15.grib2
        dst: C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2

Since we downloaded the file, now when you ask Herbie for the file, it will tell you that the file is stored locally. (Since the index files are never downloaded, we still search the source locations for the index file).

[15]:
H = Herbie(valid_date="2021-5-5", fxx=15)
πŸ‹πŸ»β€β™‚οΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from local and index file from aws.

Download a Subset File#

Often you don’t need the full file, just a few variables. Because the index files tell us the byte range of each variable or GRIB message, we can download that portion of the file. Thus, files can be subsetted by variable. (Note that Herbie cannot subset the file by geographic area).

In this example, we will download all variables for the 1-h forecast for variables that are 2 m above ground.

[16]:
# The full file already exists on Local Disk
H = Herbie(valid_date="2021-5-5", fxx=15)
H.download(":2 m", verbose=True)
πŸ‹πŸ»β€β™‚οΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from local and index file from aws.
πŸ“‡ Download subset: [HRRR] model [sfc] product run at 2021-May-04 09:00 UTC F15
 cURL from file://C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2
  58  :LTPINX:2 m above ground:15 hour fcst
  71  :TMP:2 m above ground:15 hour fcst
  72  :POT:2 m above ground:15 hour fcst
  73  :SPFH:2 m above ground:15 hour fcst
  74  :DPT:2 m above ground:15 hour fcst
  75  :RH:2 m above ground:15 hour fcst
πŸ’Ύ Saved the above subset to C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2.subset_cac774946a4df951a374b169e9345aa6ab9ed8b0

If we ask to download this file again, Herbie tells us we already have a local copy. But we can overwrite if you need to.

[18]:
# The Subset File Already Exists
H = Herbie(valid_date="2021-5-5", fxx=15)
H.download(":2 m", verbose=True, overwrite=True)
πŸ‹πŸ»β€β™‚οΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from local and index file from aws.
πŸ“‡ Download subset: [HRRR] model [sfc] product run at 2021-May-04 09:00 UTC F15
 cURL from file://C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2
  58  :LTPINX:2 m above ground:15 hour fcst
  71  :TMP:2 m above ground:15 hour fcst
  72  :POT:2 m above ground:15 hour fcst
  73  :SPFH:2 m above ground:15 hour fcst
  74  :DPT:2 m above ground:15 hour fcst
  75  :RH:2 m above ground:15 hour fcst
πŸ’Ύ Saved the above subset to C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2.subset_cac774946a4df951a374b169e9345aa6ab9ed8b0
[19]:
# Now download the full file with overwrite
H = Herbie(valid_date="2021-5-5", fxx=15, overwrite=True)
H.download()
πŸ‹πŸ»β€β™‚οΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
πŸš›πŸ’¨  Download Progress: 100.00% of 158.5 MB

Index files and Subset Search String#

Each GRIB2 file should include a companion inventory or index file. The GRIB2 filename usually has the .idx suffix appended to the end of the filename. This file is important because it tells us the byte range of each variable GRIB message, which enables us to do a partial download of the file using cURL.

The magic trick for subsetting the data for what you want comes down to the search string. Herbie uses regular expression to search for lines in the index file to match which grib messages to download. Some examples are as follows.

searchString

Messages that will be downloaded

":TMP:2 m"

Temperature at 2 m.

":TMP:"

Temperature fields at all levels.

":UGRD:.* mb"

U Wind at all pressure levels.

":500 mb:"

All variables on the 500 mb level.

":APCP:"

All accumulated precipitation fiel

":APCP:surface:0-[1-9]*"

Accumulated precip since initializ

":APCP:surface:[1-9]*-[1-9]*"

Accumulated precip over last hour

":UGRD:10 m"

U wind component at 10 meters.

":(?:U\|V)GRD:(?:10\|80) m"

U and V wind component at 10 and 8

":(?:U\|V)GRD:"

U and V wind component at all leve

":.GRD:"

(Same as above)

":(?:TMP\|DPT):"

Temperature and Dew Point for all

":(?:TMP\|DPT\|RH):"

TMP, DPT, and Relative Humidity fo

":REFC:"

Composite Reflectivity

":surface:"

All variables at the surface.

If you need help with regular expression, search the web or look at this cheatsheet.

Herbie reads the index file into a Pandas Dataframe. The regular expression searches the β€œsearch_this” column to match rows in the index file.

[20]:
H.read_idx()
[20]:
grib_message start_byte end_byte range reference_time valid_time variable level forecast_time search_this
0 1 0 636814 0-636814 2021-05-04 09:00:00 2021-05-05 REFC entire atmosphere 15 hour fcst :REFC:entire atmosphere:15 hour fcst
1 2 636814 962034 636814-962034 2021-05-04 09:00:00 2021-05-05 RETOP cloud top 15 hour fcst :RETOP:cloud top:15 hour fcst
2 3 962034 1603774 962034-1603774 2021-05-04 09:00:00 2021-05-05 var discipline=0 center=7 local_table=1 parmca... entire atmosphere 15 hour fcst :var discipline=0 center=7 local_table=1 parmc...
3 4 1603774 1910611 1603774-1910611 2021-05-04 09:00:00 2021-05-05 VIL entire atmosphere 15 hour fcst :VIL:entire atmosphere:15 hour fcst
4 5 1910611 3239678 1910611-3239678 2021-05-04 09:00:00 2021-05-05 VIS surface 15 hour fcst :VIS:surface:15 hour fcst
... ... ... ... ... ... ... ... ... ... ...
168 169 151936733 151937231 151936733-151937231 2021-05-04 09:00:00 2021-05-05 ICEC surface 15 hour fcst :ICEC:surface:15 hour fcst
169 170 151937231 153642912 151937231-153642912 2021-05-04 09:00:00 2021-05-05 SBT123 top of atmosphere 15 hour fcst :SBT123:top of atmosphere:15 hour fcst
170 171 153642912 155314328 153642912-155314328 2021-05-04 09:00:00 2021-05-05 SBT124 top of atmosphere 15 hour fcst :SBT124:top of atmosphere:15 hour fcst
171 172 155314328 156883538 155314328-156883538 2021-05-04 09:00:00 2021-05-05 SBT113 top of atmosphere 15 hour fcst :SBT113:top of atmosphere:15 hour fcst
172 173 156883538 156883538- 2021-05-04 09:00:00 2021-05-05 SBT114 top of atmosphere 15 hour fcst :SBT114:top of atmosphere:15 hour fcst

173 rows Γ— 10 columns

[21]:
# See what messages will be downloaded by a search string.
H.read_idx("(?:U|V)GRD:(?:10|80) m")
[21]:
grib_message start_byte end_byte range reference_time valid_time variable level forecast_time search_this
59 60 36906392 38106734 36906392-38106734 2021-05-04 09:00:00 2021-05-05 UGRD 80 m above ground 15 hour fcst :UGRD:80 m above ground:15 hour fcst
60 61 38106734 39269979 38106734-39269979 2021-05-04 09:00:00 2021-05-05 VGRD 80 m above ground 15 hour fcst :VGRD:80 m above ground:15 hour fcst
76 77 52662153 55043768 52662153-55043768 2021-05-04 09:00:00 2021-05-05 UGRD 10 m above ground 15 hour fcst :UGRD:10 m above ground:15 hour fcst
77 78 55043768 57425383 55043768-57425383 2021-05-04 09:00:00 2021-05-05 VGRD 10 m above ground 15 hour fcst :VGRD:10 m above ground:15 hour fcst
[22]:
# See what messages will be downloaded by a search string.
H.read_idx("(U|V)GRD:[8|5][0|5]0 mb")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:634: UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
  logic = df.search_this.str.contains(searchString)
[22]:
grib_message start_byte end_byte range reference_time valid_time variable level forecast_time search_this
16 17 11102539 11718442 11102539-11718442 2021-05-04 09:00:00 2021-05-05 UGRD 500 mb 15 hour fcst :UGRD:500 mb:15 hour fcst
17 18 11718442 12320354 11718442-12320354 2021-05-04 09:00:00 2021-05-05 VGRD 500 mb 15 hour fcst :VGRD:500 mb:15 hour fcst
27 28 19477225 20120596 19477225-20120596 2021-05-04 09:00:00 2021-05-05 UGRD 850 mb 15 hour fcst :UGRD:850 mb:15 hour fcst
28 29 20120596 20748701 20120596-20748701 2021-05-04 09:00:00 2021-05-05 VGRD 850 mb 15 hour fcst :VGRD:850 mb:15 hour fcst

Here’s another example: download all variables at 500 mb

[23]:
# Download a different Subset of File the local file
H = Herbie(valid_date="2022-3-5 12:00", fxx=0)
H.download(":500 mb:", verbose=True)
πŸ‹πŸ»β€β™‚οΈ Found 2022-Mar-05 12:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
πŸ‘¨πŸ»β€πŸ­ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20220305]
πŸ“‡ Download subset: [HRRR] model [sfc] product run at 2022-Mar-05 12:00 UTC F00
 cURL from https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220305/conus/hrrr.t12z.wrfsfcf00.grib2
  14  :HGT:500 mb:anl
  15  :TMP:500 mb:anl
  16  :DPT:500 mb:anl
  17  :UGRD:500 mb:anl
  18  :VGRD:500 mb:anl
πŸ’Ύ Saved the above subset to C:\Users\blayl_depgywe\data\hrrr\20220305\hrrr.t12z.wrfsfcf00.grib2.subset_8a05a62e3e874603b5d6b37737904e0ec526ce1a

Herbie creates a unique filename for the subsetted files when it is downloaded.

[24]:
# Show path to subset file. You should check if this path exists or not.
H.get_localFilePath(":500 mb")
[24]:
WindowsPath('C:/Users/blayl_depgywe/data/hrrr/20220305/hrrr.t12z.wrfsfcf00.grib2.subset_8a05a62e3e874603b5d6b37737904e0ec526ce1a')

Read GRIB2 file with xarray#

Herbie can read GRIB2 files with xarray via cfgrib. By default, if the file requested does not already exist on local disk, Herbie will delete the file after it is loaded into memory (if on Linux; removing file does not work on Windows.).

[29]:
# Read file with xarray that does not exists on disk
H = Herbie("2022-4-2 06:00", fxx=0)
Hx = H.xarray(":500 mb", verbose=True)
Hx
πŸ‹πŸ»β€β™‚οΈ Found 2022-Apr-02 06:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
πŸ‘¨πŸ»β€πŸ­ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20220402]
πŸ“‡ Download subset: [HRRR] model [sfc] product run at 2022-Apr-02 06:00 UTC F00
 cURL from https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220402/conus/hrrr.t06z.wrfsfcf00.grib2
  14  :HGT:500 mb:anl
  15  :TMP:500 mb:anl
  16  :DPT:500 mb:anl
  17  :UGRD:500 mb:anl
  18  :VGRD:500 mb:anl
πŸ’Ύ Saved the above subset to C:\Users\blayl_depgywe\data\hrrr\20220402\hrrr.t06z.wrfsfcf00.grib2.subset_8a05a62e3e874603b5d6b37737904e0ec526ce1a
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
  warnings.warn("sorry, on windows I couldn't remove the file.")
[29]:
<xarray.Dataset>
Dimensions:              (y: 1059, x: 1799)
Coordinates:
    time                 datetime64[ns] 2022-04-02T06:00:00
    step                 timedelta64[ns] 00:00:00
    isobaricInhPa        float64 500.0
    latitude             (y, x) float64 21.14 21.15 21.15 ... 47.86 47.85 47.84
    longitude            (y, x) float64 237.3 237.3 237.3 ... 299.0 299.0 299.1
    valid_time           datetime64[ns] 2022-04-02T06:00:00
Dimensions without coordinates: y, x
Data variables:
    t                    (y, x) float32 267.3 267.3 267.3 ... 250.7 250.8 250.8
    u                    (y, x) float32 6.373 6.373 6.373 ... -2.002 -1.94 -1.94
    v                    (y, x) float32 -4.079 -4.079 -4.079 ... 18.8 19.17
    gh                   (y, x) float32 5.853e+03 5.853e+03 ... 5.308e+03
    dpt                  (y, x) float32 244.5 244.4 244.2 ... 229.5 229.4 229.2
    gribfile_projection  object None
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 sfc
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              C:\Users\blayl_depgywe\data\hrrr\20220402\hrrr.t...
    searchString:            :500 mb
[30]:
# Notice that the local grib subset file does not exists locally because it was removed
Hx.attrs["local_grib"].exists()
[30]:
True
[27]:
# You can tell xarray not to delete the grib2 file
H = Herbie("2021-5-6", fxx=0)
Hx = H.xarray(":500 mb", remove_grib=False)
Hx
πŸ‹πŸ»β€β™‚οΈ Found 2021-May-06 00:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
πŸ‘¨πŸ»β€πŸ­ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20210506]
[27]:
<xarray.Dataset>
Dimensions:              (y: 1059, x: 1799)
Coordinates:
    time                 datetime64[ns] 2021-05-06
    step                 timedelta64[ns] 00:00:00
    isobaricInhPa        float64 500.0
    latitude             (y, x) float64 21.14 21.15 21.15 ... 47.86 47.85 47.84
    longitude            (y, x) float64 237.3 237.3 237.3 ... 299.0 299.0 299.1
    valid_time           datetime64[ns] 2021-05-06
Dimensions without coordinates: y, x
Data variables:
    t                    (y, x) float32 ...
    u                    (y, x) float32 ...
    v                    (y, x) float32 ...
    gh                   (y, x) float32 ...
    dpt                  (y, x) float32 ...
    gribfile_projection  object None
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 sfc
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              C:\Users\blayl_depgywe\data\hrrr\20210506\hrrr.t...
    searchString:            :500 mb
[31]:
# The local grib does exists
Hx.attrs["local_grib"].exists()
[31]:
True

Working with multiple Herbie objects#

Use the fast Herbie functions when you want to work with multiple files at once. Fast Herbie uses multithreading to increase the speed of sequentially creating Herbie objects.

Creating a Herbie object is a lot of network traffic (Herbie check if a GRIB2 file exits at a lot of different remote archives and also looks for index files). Herbie also downloads the GRIB2 files, which is also mainly done over the network, and reading the data into xarray depends on the downloads.

Multithreading is useful for I/O bound tasks. As I understand, communication across the internet falls under this category. So multi threads can be helpful when creating many Herbie objects.

  • Creating lots of Herbie objects (herbie.tools.fast_Herbie)

  • Downloading lots of files (herbie.tools.fast_Herbie_download)

  • Loading lots of files into xarray (herbie.tools.fast_Herbie_xarray)

[32]:
from herbie.tools import fast_Herbie, fast_Herbie_download, fast_Herbie_xarray
import pandas as pd
[33]:
# Use pandas to create a list of Datetimes
DATES = pd.date_range("2022-01-01", periods=6, freq="1H")

# Create a list of forecast lead times
fxx = [0, 1, 2, 3, 4, 5, 6]

Fast Herbie#

Create many Herbie objects for all the dates and lead times requested.

[34]:
# Create many Herbie objects (for all dates and lead times requested)
HH = fast_Herbie(DATES=DATES, fxx=fxx)
len(HH), HH
[34]:
(42,
 [[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F06])

Fast Herbie Download#

Download the files for a subset of many Herbie objects

[35]:
# Download those Herbie objects (subset by 2-m temperature)
a = fast_Herbie_download(DATES=DATES, fxx=fxx, searchString="TMP:2 m")
a
πŸ‘¨πŸ»β€πŸ­ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20220101]
[35]:
{'passed': [[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F06,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F00,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F01,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F02,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F03,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F04,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F05,
  [HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F06],
 'failed': []}

Fast Herbie xarray#

Read the data into an xarray DataFrame. Notice that this concatenates all the files along the datetime (t) and lead time (f) dimensions.

NOTE: The searchString must return data on the same hyper cube (data must be on the same type of level; see cfgrib for more details). For instance, you shouldn’t load 2-m and 500 hPa data in the same object.

WARNING: Could run into memory limit if requesting too much data.

[36]:
# Read into xarray
ds = fast_Herbie_xarray(DATES=DATES[:3], fxx=[0, 6], searchString="(?:U|V)GRD:10 m")
ds
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
  warnings.warn("sorry, on windows I couldn't remove the file.")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
  warnings.warn("sorry, on windows I couldn't remove the file.")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
  warnings.warn("sorry, on windows I couldn't remove the file.")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
  warnings.warn("sorry, on windows I couldn't remove the file.")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
  warnings.warn("sorry, on windows I couldn't remove the file.")
[36]:
<xarray.Dataset>
Dimensions:              (t: 3, f: 2, y: 1059, x: 1799)
Coordinates:
    time                 (t) datetime64[ns] 2022-01-01 ... 2022-01-01T02:00:00
    step                 (f) timedelta64[ns] 00:00:00 06:00:00
    heightAboveGround    float64 10.0
    latitude             (y, x) float64 21.14 21.15 21.15 ... 47.86 47.85 47.84
    longitude            (y, x) float64 237.3 237.3 237.3 ... 299.0 299.0 299.1
    valid_time           (f, t) datetime64[ns] 2022-01-01 ... 2022-01-01T08:0...
Dimensions without coordinates: t, f, y, x
Data variables:
    u10                  (f, t, y, x) float32 -3.392 -3.33 ... -1.313 -1.375
    v10                  (f, t, y, x) float32 -5.777 -5.777 ... 2.929 2.991
    gribfile_projection  (f, t) object None None None None None None

Plot with herbie xarray custom accessor#

πŸ—οΈ WORK IN PROGRESS

This requires my Carpenter Workshop functions.

[24]:
Hx.herbie.plot()
cfgrib variable: t
GRIB_cfName air_temperature
GRIB_cfVarName t
GRIB_name Temperature
GRIB_units K
GRIB_typeOfLevel isobaricInhPa

/p/home/blaylock/anaconda3/envs/basic38/lib/python3.8/site-packages/cartopy/mpl/geoaxes.py:1702: UserWarning: The input coordinates to pcolormesh are interpreted as cell centers, but are not monotonically increasing or decreasing. This may lead to incorrectly calculated cell edges, in which case, please supply explicit cell edges to pcolormesh.
  X, Y, C, shading = self._pcolorargs('pcolormesh', *args,
cfgrib variable: u
GRIB_cfName eastward_wind
GRIB_cfVarName u
GRIB_name U component of wind
GRIB_units m s**-1
GRIB_typeOfLevel isobaricInhPa

cfgrib variable: v
GRIB_cfName northward_wind
GRIB_cfVarName v
GRIB_name V component of wind
GRIB_units m s**-1
GRIB_typeOfLevel isobaricInhPa

cfgrib variable: gh
GRIB_cfName geopotential_height
GRIB_cfVarName gh
GRIB_name Geopotential Height
GRIB_units gpm
GRIB_typeOfLevel isobaricInhPa

cfgrib variable: dpt
GRIB_cfName unknown
GRIB_cfVarName dpt
GRIB_name Dew point temperature
GRIB_units K
GRIB_typeOfLevel isobaricInhPa

[24]:
<GeoAxesSubplot:title={'left':'Run: 00:00 UTC 06 May 2021 F00','center':'HRRR 500 hPa','right':'Valid: 00:00 UTC 06 May 2021'}>
../../_images/user_guide_notebooks_tutorial_52_4.png
../../_images/user_guide_notebooks_tutorial_52_5.png
../../_images/user_guide_notebooks_tutorial_52_6.png
../../_images/user_guide_notebooks_tutorial_52_7.png
../../_images/user_guide_notebooks_tutorial_52_8.png