herbie.archive.Herbie#

class herbie.archive.Herbie(date=None, *, valid_date=None, model='hrrr', fxx=0, product=None, priority=None, save_dir=WindowsPath('C:/Users/blayl_depgywe/data'), overwrite=False, verbose=True, **kwargs)[source]#

Locate GRIB2 file at one of the archive sources.

Parameters:
  • date (pandas-parsable datetime) – Model initialization datetime. If None, then must set valid_date.

  • valid_date (pandas-parsable datetime) – Model valid datetime. Must set when date is None.

  • fxx (int) – Forecast lead time in hours. Available lead times depend on the model type and model version. Range is model and run dependant.

  • model ({'hrrr', 'hrrrak', 'rap', 'gfs', 'gfs_wave', 'ecmwf', 'rrfs', etc.}) – Model name as defined in the models template folder. CASE INSENSITIVE Some examples: - 'hrrr' HRRR contiguous United States model - 'hrrrak' HRRR Alaska model (alias 'alaska') - 'rap' RAP model - 'ecmwf' ECMWF open data forecat products

  • product ({'sfc', 'prs', 'nat', 'subh'}) – Output variable product file type. If not specified, will use first product in model template file. CASE SENSITIVE. For example, the HRRR model has these products: - 'sfc' surface fields - 'prs' pressure fields - 'nat' native fields - 'subh' subhourly fields

  • member (None or int) – Some ensemble models (e.g. the future RRFS) will need to specify an ensemble member.

  • priority (list or str) – List of model sources to get the data in the order of download priority. CASE INSENSITIVE. Some example data sources and the default priority order are listed below. - 'aws' Amazon Web Services (Big Data Program) - 'nomads' NOAA’s NOMADS server - 'google' Google Cloud Platform (Big Data Program) - 'azure' Microsoft Azure (Big Data Program) - 'pando' University of Utah Pando Archive (gateway 1) - 'pando2' University of Utah Pando Archive (gateway 2)

  • save_dir (str or pathlib.Path) – Location to save GRIB2 files locally. Default save directory is set in ~/.config/herbie/config.cfg.

  • Overwrite (bool) – If True, look for GRIB2 files even if local copy exists. If False (default), use the local copy (still need to find the idx file).

  • **kwargs – Any other paremeter needed to satisfy the conditions in the model template file (e.g., nest=2, other_label=’run2’)

__init__(date=None, *, valid_date=None, model='hrrr', fxx=0, product=None, priority=None, save_dir=WindowsPath('C:/Users/blayl_depgywe/data'), overwrite=False, verbose=True, **kwargs)[source]#

Specify model output and find GRIB2 file at one of the sources.

Methods

__init__([date, valid_date, model, fxx, ...])

Specify model output and find GRIB2 file at one of the sources.

download([searchString, source, save_dir, ...])

Download file from source.

find_grib([overwrite])

Find a GRIB file from the archive sources

find_idx()

Find an index file for the GRIB file

get_localFilePath([searchString])

Get path to local file

read_idx([searchString])

Inspect the GRIB2 file contents by reading the index file.

tell_me_everything()

Print all the attributes of the Herbie object

xarray([searchString, backend_kwargs, ...])

Open GRIB2 data as xarray DataSet

Attributes

get_localFileName

Predict Local File Name of the full file

get_remoteFileName

Predict Remote File Name

index_as_dataframe

Read and cache the full index file

Methods:

__init__([date, valid_date, model, fxx, ...])

Specify model output and find GRIB2 file at one of the sources.

download([searchString, source, save_dir, ...])

Download file from source.

find_grib([overwrite])

Find a GRIB file from the archive sources

find_idx()

Find an index file for the GRIB file

get_localFilePath([searchString])

Get path to local file

read_idx([searchString])

Inspect the GRIB2 file contents by reading the index file.

tell_me_everything()

Print all the attributes of the Herbie object

xarray([searchString, backend_kwargs, ...])

Open GRIB2 data as xarray DataSet

Attributes:

get_localFileName

Predict Local File Name of the full file

get_remoteFileName

Predict Remote File Name

index_as_dataframe

Read and cache the full index file

__init__(date=None, *, valid_date=None, model='hrrr', fxx=0, product=None, priority=None, save_dir=WindowsPath('C:/Users/blayl_depgywe/data'), overwrite=False, verbose=True, **kwargs)[source]#

Specify model output and find GRIB2 file at one of the sources.

download(searchString=None, *, source=None, save_dir=None, overwrite=None, verbose=None, errors='warn')[source]#

Download file from source.

TODO: When we download a full file, the value of self.grib and TODO: self.grib_source should change to represent the local file.

Subsetting by variable follows the same principles described here: https://www.cpc.ncep.noaa.gov/products/wesley/fast_downloading_grib.html

Parameters:
  • searchString (str) – If None, download the full file. Else, use regex to subset the file by specific variables and levels. .. include:: ../../user_guide/searchString.rst

  • source ({'nomads', 'aws', 'google', 'azure', 'pando', 'pando2'}) – If None, download GRIB2 file from self.grib2 which is the first location the GRIB2 file was found from the priority lists when this class was initialized. Else, you may specify the source to force downloading it from a different location.

  • save_dir (str or pathlib.Path) – Location to save the model output files. If None, uses the default or path specified in __init__. Else, changes the path files are saved.

  • overwrite (bool) – If True, overwrite existing files. Default will skip downloading if the full file exists. Not applicable when when searchString is not None because file subsets might be unique.

  • errors ({'warn', 'raise'}) – When an error occurs, send a warning or raise a value error.

find_grib(overwrite=False)[source]#

Find a GRIB file from the archive sources

Returns:

  • 1) The URL or pathlib.Path to the GRIB2 files that exists

  • 2) The source of the GRIB2 file

find_idx()[source]#

Find an index file for the GRIB file

property get_localFileName#

Predict Local File Name of the full file

get_localFilePath(searchString=None)[source]#

Get path to local file

property get_remoteFileName#

Predict Remote File Name

property index_as_dataframe#

Read and cache the full index file

read_idx(searchString=None)[source]#

Inspect the GRIB2 file contents by reading the index file.

This reads index files created with the wgrib2 utility.

Parameters:

searchString (str) –

Filter dataframe by a searchString regular expression. Searches for strings in the index file lines, specifically the variable, level, and forecast_time columns. Execute _searchString_help() for examples of a good searchString.

Subsetting is done using the GRIB2 index files. Index files define the grib variables/parameters of each message (sometimes it is useful to think of a grib message as a “layer” of the file) and define the byte range of the message.

Herbie can subset a file by grib message by downloading a byte range of the file. This way, instead of downloading the full file, you can download just the “layer” of the file you want. The searchString method implemented in Herbie to do a partial download is similar to what is explained here: https://www.cpc.ncep.noaa.gov/products/wesley/fast_downloading_grib.html

Herbie supports reading two different types of index files

  1. Index files output by the wgrib2 command-line utility. These index files are common for forecast models provided by NCEP.

  2. Index files output by the ecCodes/grib_ls command-line utlity. These index files are common for forecast models provided by ECMWF.

You can use regular expression to search for lines in the index file. If H is a Herbie object, the regex search is performed on the H.read_idx().search_this column of the DataFrame

Tip

If you need help with regular expression, search the web or look at this cheatsheet. Check regular expressions with regexr or regex101.

Here are some examples you can use for the searchString argument for the wgrib2-style index files.

searchString=

GRIB messages that will be downloaded

":TMP:2 m"

Temperature at 2 m.

":TMP:"

Temperature fields at all levels.

":UGRD:.* mb"

U Wind at all pressure levels.

":500 mb:"

All variables on the 500 mb level.

":APCP:"

All accumulated precipitation fields.

":APCP:surface:0-[1-9]*"

Accumulated precip since initialization time

":APCP:surface:[1-9]*-[1-9]*"

Accumulated precip over last hour

":UGRD:10 m"

U wind component at 10 meters.

":(U|V)GRD:(10|80) m"

U and V wind component at 10 and 80 m.

":(U|V)GRD:"

U and V wind component at all levels.

":(?:U|V)GRD:[0-9]+ hybrid"

U and V wind components at all hybrid levels

":(?:U|V)GRD:[0-9]+ mb"

U and V wind components at all pressure levels

":.GRD:"

(Same as above)

":(TMP|DPT):"

Temperature and Dew Point for all levels .

":(TMP|DPT|RH):"

TMP, DPT, and Relative Humidity for all levels.

":REFC:"

Composite Reflectivity

":surface:"

All variables at the surface.

"^TMP:2 m.*fcst$"

Beginning of string (^), end of string ($) wildcard (.*)

Hint

The NCEP Parameters & Units Table is a useful resource to help you identify wgrib2-style GRIB variable abbreviations and their meanings.

Here are some examples you can use for the searchString argument for the grib_ls-style index files.

Look at the ECMWF GRIB Parameter Database https://apps.ecmwf.int/codes/grib/param-db

This table is for the operational forecast product (and ensemble product):

searchString (oper/enso)

Messages that will be downloaded

”:2t:”

2-m temperature

”:10u:”

10-m u wind vector

”:10v:”

10-m v wind vector

”:10(u|v):

10m u and 10m v wind

”:d:”

Divergence (all levels)

”:gh:”

geopotential height (all levels)

”:gh:500”

geopotential height only at 500 hPa

”:st:”

soil temperature

”:tp:”

total precipitation

”:msl:”

mean sea level pressure

”:q:”

Specific Humidity

”:r:”

relative humidity

”:ro:”

Runn-off

”:skt:”

skin temperature

”:sp:”

surface pressure

”:t:”

temperature

”:tcwv:”

Total column vertically integrated water vapor

”:vo:”

Relative vorticity

”:v:”

v wind vector

”:u:”

u wind vector

”:(t|u|v|r):”

Temp, u/v wind, RH (all levels)

”:500:”

All variables on the 500 hPa level

This table is for the wave product (and ensemble wave product):

searchString (wave/waef)

Messages that will be downloaded

”:swh:”

Significant height of wind waves + swell

”:mwp:”

Mean wave period

”:mwd:”

Mean wave direction

”:pp1d:”

Peak wave period

”:mp2:”

Mean zero-crossing wave period

Hint

The ECMWF Parameter Database is a useful resource to help you identify ecCodes-style GRIB variable abbreviations and their meanings.

Return type:

A Pandas DataFrame of the index file.

tell_me_everything()[source]#

Print all the attributes of the Herbie object

xarray(searchString=None, backend_kwargs={}, remove_grib=True, **download_kwargs)[source]#

Open GRIB2 data as xarray DataSet

Parameters:
  • searchString (str) – Variables to read into xarray Dataset

  • remove_grib (bool) – If True, grib file will be removed ONLY IF it didn’t exist before we downloaded it.