Quick Tutorial#
There are mainly two methods for using Herbie 1. When working with one file at a time, you should use the Herbie
class imported from herbie.archive
to create Herbie objects. 1. When working with many files at a time across different dates and forecast lead times, there are some helper functions in herbie.tools
.
Creating a Herbie Object#
The Herbie
class gives you the details about an single GRIB2 file with methods to download the file, open with xarray, and subset the file by variable.
What does this class do? When you specify a datetime, model type, and forecast lead time, Herbie will search the different archive sources for the file you are requesting. By default, it searches for the HRRR model (model='hrrr'
) surface fields (product='sfc'
) for the zero-hour lead time (fxx=0'
).
[1]:
from herbie.archive import Herbie
[2]:
help(Herbie)
Help on class Herbie in module herbie.archive:
class Herbie(builtins.object)
| Herbie(date=None, *, valid_date=None, model='hrrr', fxx=0, product=None, priority=None, save_dir=WindowsPath('C:/Users/blayl_depgywe/data'), overwrite=False, verbose=True, **kwargs)
|
| Locate GRIB2 file at one of the archive sources.
|
| Parameters
| ----------
| date : pandas-parsable datetime
| *Model initialization datetime*.
| If None, then must set ``valid_date``.
| valid_date : pandas-parsable datetime
| Model valid datetime. Must set when ``date`` is None.
| fxx : int
| Forecast lead time in hours. Available lead times depend on
| the model type and model version. Range is model and run
| dependant.
| model : {'hrrr', 'hrrrak', 'rap', 'gfs', 'gfs_wave', 'ecmwf', 'rrfs', etc.}
| Model name as defined in the models template folder. CASE INSENSITIVE
| Some examples:
| - ``'hrrr'`` HRRR contiguous United States model
| - ``'hrrrak'`` HRRR Alaska model (alias ``'alaska'``)
| - ``'rap'`` RAP model
| - ``'ecmwf'`` ECMWF open data forecat products
| product : {'sfc', 'prs', 'nat', 'subh'}
| Output variable product file type. If not specified, will
| use first product in model template file. CASE SENSITIVE.
| For example, the HRRR model has these products:
| - ``'sfc'`` surface fields
| - ``'prs'`` pressure fields
| - ``'nat'`` native fields
| - ``'subh'`` subhourly fields
| member : None or int
| Some ensemble models (e.g. the future RRFS) will need to
| specify an ensemble member.
| priority : list or str
| List of model sources to get the data in the order of
| download priority. CASE INSENSITIVE. Some example data
| sources and the default priority order are listed below.
| - ``'aws'`` Amazon Web Services (Big Data Program)
| - ``'nomads'`` NOAA's NOMADS server
| - ``'google'`` Google Cloud Platform (Big Data Program)
| - ``'azure'`` Microsoft Azure (Big Data Program)
| - ``'pando'`` University of Utah Pando Archive (gateway 1)
| - ``'pando2'`` University of Utah Pando Archive (gateway 2)
| save_dir : str or pathlib.Path
| Location to save GRIB2 files locally. Default save directory
| is set in ``~/.config/herbie/config.cfg``.
| Overwrite : bool
| If True, look for GRIB2 files even if local copy exists.
| If False (default), use the local copy (still need to find
| the idx file).
| **kwargs
| Any other paremeter needed to satisfy the conditions in the
| model template file (e.g., nest=2, other_label='run2')
|
| Methods defined here:
|
| __init__(self, date=None, *, valid_date=None, model='hrrr', fxx=0, product=None, priority=None, save_dir=WindowsPath('C:/Users/blayl_depgywe/data'), overwrite=False, verbose=True, **kwargs)
| Specify model output and find GRIB2 file at one of the sources.
|
| __repr__(self)
| Representation in Notebook
|
| __str__(self)
| When Herbie class object is printed, print all properties
|
| download(self, searchString=None, *, source=None, save_dir=None, overwrite=None, verbose=None, errors='warn')
| Download file from source.
|
| Subsetting by variable follows the same principles described here:
| https://www.cpc.ncep.noaa.gov/products/wesley/fast_downloading_grib.html
|
| Parameters
| ----------
| searchString : str
| If None, download the full file. Else, use regex to subset
| the file by specific variables and levels.
| .. include:: ../../user_guide/searchString.rst
| source : {'nomads', 'aws', 'google', 'azure', 'pando', 'pando2'}
| If None, download GRIB2 file from self.grib2 which is
| the first location the GRIB2 file was found from the
| priority lists when this class was initialized. Else, you
| may specify the source to force downloading it from a
| different location.
| save_dir : str or pathlib.Path
| Location to save the model output files.
| If None, uses the default or path specified in __init__.
| Else, changes the path files are saved.
| overwrite : bool
| If True, overwrite existing files. Default will skip
| downloading if the full file exists. Not applicable when
| when searchString is not None because file subsets might
| be unique.
| errors : {'warn', 'raise'}
| When an error occurs, send a warning or raise a value error.
|
| find_grib(self, overwrite=False)
| Find a GRIB file from the archive sources
|
| Returns
| -------
| 1) The URL or pathlib.Path to the GRIB2 files that exists
| 2) The source of the GRIB2 file
|
| find_idx(self)
| Find an index file for the GRIB file
|
| get_localFilePath(self, searchString=None)
| Get path to local file
|
| index_as_dataframe = <functools.cached_property object>
| Read and cache the full index file
|
| read_idx(self, searchString=None)
| Inspect the GRIB2 file contents by reading the index file.
|
| This reads index files created with the wgrib2 utility.
|
| Parameters
| ----------
| searchString : str
| Filter dataframe by a searchString regular expression.
| Searches for strings in the index file lines, specifically
| the variable, level, and forecast_time columns.
| Execute ``_searchString_help()`` for examples of a good
| searchString.
|
| .. include:: ../../user_guide/searchString.rst
|
| Returns
| -------
| A Pandas DataFrame of the index file.
|
| xarray(self, searchString=None, backend_kwargs={}, remove_grib=True, **download_kwargs)
| Open GRIB2 data as xarray DataSet
|
| Parameters
| ----------
| searchString : str
| Variables to read into xarray Dataset
| remove_grib : bool
| If True, grib file will be removed ONLY IF it didn't exist
| before we downloaded it.
|
| ----------------------------------------------------------------------
| Readonly properties defined here:
|
| __logo__
| For Fun, show the Herbie Logo
|
| get_localFileName
| Predict Local File Name
|
| get_remoteFileName
| Predict Remote File Name
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
[3]:
H = Herbie("2022-4-23 00:00")
ππ»ββοΈ Found 2022-Apr-23 00:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
The Herbie object tells us a file matching our request was found on Amazon Web Services (AWS).
We can display some of details from the Herbie object by printing it.
[4]:
print(H)
β ββ
β ββ βββ βββ βββ βββ
β ββ β β β ββββββββββββββ β βββββββββ
β ββ β βββ ββ ββ ββ βββββ βββββ ββ ββ β
β ββ β βββ ββ βββββ β β ββ ββ ββ ββββ
β ββ βββ ββββββββββββ βββββββββββββββ
β ββ
π Retrieve NWP Model Data π
self.DESCRIPTION=High-Resolution Rapid Refresh - CONUS
self.DETAILS={'NOMADS product description': 'https://www.nco.ncep.noaa.gov/pmb/products/hrrr/', 'University of Utah HRRR archive': 'http://hrrr.chpc.utah.edu/'}
self.EXPECT_IDX_FILE=remote
self.IDX_STYLE=wgrib2
self.LOCALFILE=hrrr.t00z.wrfsfcf00.grib2
self.PRODUCTS={'sfc': '2D surface level fields; 3-km resolution', 'prs': '3D pressure level fields; 3-km resolution', 'nat': 'Native level fields; 3-km resolution', 'subh': 'Subhourly grids; 3-km resolution'}
self.SOURCES={'aws': 'https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2', 'nomads': 'https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2', 'google': 'https://storage.googleapis.com/high-resolution-rapid-refresh/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2', 'azure': 'https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2', 'pando': 'https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20220423/hrrr.t00z.wrfsfcf00.grib2', 'pando2': 'https://pando-rgw02.chpc.utah.edu/hrrr/sfc/20220423/hrrr.t00z.wrfsfcf00.grib2'}
self.fxx=0
self.get_localFileName=hrrr.t00z.wrfsfcf00.grib2
self.get_remoteFileName=hrrr.t00z.wrfsfcf00.grib2
self.grib=https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2
self.grib_source=aws
self.idx=https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2.idx
self.idx_source=aws
self.model=hrrr
self.overwrite=False
self.product=sfc
self.product_description=2D surface level fields; 3-km resolution
self.searchString_help=
Use regular expression to search for lines in the index file.
Here are some examples you can use for the wgrib2-style `searchString`
============================= ===============================================
``searchString=`` Messages that will be downloaded
============================= ===============================================
":TMP:2 m" Temperature at 2 m.
":TMP:" Temperature fields at all levels.
":UGRD:.* mb" U Wind at all pressure levels.
":500 mb:" All variables on the 500 mb level.
":APCP:" All accumulated precipitation fields.
":APCP:surface:0-[1-9]*" Accumulated precip since initialization time
":APCP:surface:[1-9]*-[1-9]*" Accumulated precip over last hour
":UGRD:10 m" U wind component at 10 meters.
":(U|V)GRD:(10|80) m" U and V wind component at 10 and 80 m.
":(U|V)GRD:" U and V wind component at all levels.
":(?:U|V)GRD:[0-9]+ hybrid" U and V wind components at all hybrid levels
":(?:U|V)GRD:[0-9]+ mb" U and V wind components at all pressure levels
":.GRD:" (Same as above)
":(TMP|DPT):" Temperature and Dew Point for all levels .
":(TMP|DPT|RH):" TMP, DPT, and Relative Humidity for all levels.
":REFC:" Composite Reflectivity
":surface:" All variables at the surface.
============================= ===============================================
If you need help with regular expression, search the web or look at
this cheatsheet: https://www.petefreitag.com/cheatsheets/regex/.
self.verbose=True
Now lets look at the GRIB2 and index file URLs.
[5]:
print(H.grib)
print(H.idx)
https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2
https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220423/conus/hrrr.t00z.wrfsfcf00.grib2.idx
Generally, you will only need to search for files using the default source priority order. But you can change the priority order if you wish.
[6]:
# Specify the source priority to only look on Pando
H = Herbie("2022-1-5", priority="pando")
ππ»ββοΈ Found 2022-Jan-05 00:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from pando and index file from pando.
[7]:
# Specify the source priority to only look on NOMADS
H = Herbie("2022-1-5", priority="nomads")
π Did not find a GRIB2 or Index File for 2022-Jan-05 00:00 UTC F00 HRRR
It doesnβt look like the file was found on the NOMADS server. We can tell Herbie to look at AWS after looking at NOMADS
[8]:
# Specify the source priority.
H = Herbie("2021-5-5", priority=["nomads", "aws"])
ππ»ββοΈ Found 2021-May-05 00:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
Ok, lets ask for the 15-hour forecast from our requested datetime
[9]:
H = Herbie("2021-5-5", fxx=15)
ππ»ββοΈ Found 2021-May-05 00:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
We can also tell Herbie that the datetime we are requesting is the valid time. Herbie will adjust the model run time by the lead time requested.
[10]:
H = Herbie(valid_date="2021-5-5", fxx=15)
ππ»ββοΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from local and index file from aws.
Download a Full File#
If the file exists at one of the source locations, Herbie can download the full file to your local drive.
[14]:
H = Herbie(valid_date="2021-5-5", fxx=15)
H.download(verbose=True)
ππ»ββοΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
π¨π»βπ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20210504]
β
Success! Downloaded HRRR from aws
src: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20210504/conus/hrrr.t09z.wrfsfcf15.grib2
dst: C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2
Since we downloaded the file, now when you ask Herbie for the file, it will tell you that the file is stored locally. (Since the index files are never downloaded, we still search the source locations for the index file).
[15]:
H = Herbie(valid_date="2021-5-5", fxx=15)
ππ»ββοΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from local and index file from aws.
Download a Subset File#
Often you donβt need the full file, just a few variables. Because the index files tell us the byte range of each variable or GRIB message, we can download that portion of the file. Thus, files can be subsetted by variable. (Note that Herbie cannot subset the file by geographic area).
In this example, we will download all variables for the 1-h forecast for variables that are 2 m above ground.
[16]:
# The full file already exists on Local Disk
H = Herbie(valid_date="2021-5-5", fxx=15)
H.download(":2 m", verbose=True)
ππ»ββοΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from local and index file from aws.
π Download subset: [HRRR] model [sfc] product run at 2021-May-04 09:00 UTC F15
cURL from file://C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2
58 :LTPINX:2 m above ground:15 hour fcst
71 :TMP:2 m above ground:15 hour fcst
72 :POT:2 m above ground:15 hour fcst
73 :SPFH:2 m above ground:15 hour fcst
74 :DPT:2 m above ground:15 hour fcst
75 :RH:2 m above ground:15 hour fcst
πΎ Saved the above subset to C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2.subset_cac774946a4df951a374b169e9345aa6ab9ed8b0
If we ask to download this file again, Herbie tells us we already have a local copy. But we can overwrite if you need to.
[18]:
# The Subset File Already Exists
H = Herbie(valid_date="2021-5-5", fxx=15)
H.download(":2 m", verbose=True, overwrite=True)
ππ»ββοΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from local and index file from aws.
π Download subset: [HRRR] model [sfc] product run at 2021-May-04 09:00 UTC F15
cURL from file://C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2
58 :LTPINX:2 m above ground:15 hour fcst
71 :TMP:2 m above ground:15 hour fcst
72 :POT:2 m above ground:15 hour fcst
73 :SPFH:2 m above ground:15 hour fcst
74 :DPT:2 m above ground:15 hour fcst
75 :RH:2 m above ground:15 hour fcst
πΎ Saved the above subset to C:\Users\blayl_depgywe\data\hrrr\20210504\hrrr.t09z.wrfsfcf15.grib2.subset_cac774946a4df951a374b169e9345aa6ab9ed8b0
[19]:
# Now download the full file with overwrite
H = Herbie(valid_date="2021-5-5", fxx=15, overwrite=True)
H.download()
ππ»ββοΈ Found 2021-May-04 09:00 UTC F15 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
ππ¨ Download Progress: 100.00% of 158.5 MB
Index files and Subset Search String#
Each GRIB2 file should include a companion inventory or index file. The GRIB2 filename usually has the .idx suffix appended to the end of the filename. This file is important because it tells us the byte range of each variable GRIB message, which enables us to do a partial download of the file using cURL.
The magic trick for subsetting the data for what you want comes down to the search string. Herbie uses regular expression to search for lines in the index file to match which grib messages to download. Some examples are as follows.
|
Messages that will be downloaded |
---|---|
|
Temperature at 2 m. |
|
Temperature fields at all levels. |
|
U Wind at all pressure levels. |
|
All variables on the 500 mb level. |
|
All accumulated precipitation fiel |
|
Accumulated precip since initializ |
|
Accumulated precip over last hour |
|
U wind component at 10 meters. |
|
U and V wind component at 10 and 8 |
|
U and V wind component at all leve |
|
(Same as above) |
|
Temperature and Dew Point for all |
|
TMP, DPT, and Relative Humidity fo |
|
Composite Reflectivity |
|
All variables at the surface. |
If you need help with regular expression, search the web or look at this cheatsheet.
Herbie reads the index file into a Pandas Dataframe. The regular expression searches the βsearch_thisβ column to match rows in the index file.
[20]:
H.read_idx()
[20]:
grib_message | start_byte | end_byte | range | reference_time | valid_time | variable | level | forecast_time | search_this | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 636814 | 0-636814 | 2021-05-04 09:00:00 | 2021-05-05 | REFC | entire atmosphere | 15 hour fcst | :REFC:entire atmosphere:15 hour fcst |
1 | 2 | 636814 | 962034 | 636814-962034 | 2021-05-04 09:00:00 | 2021-05-05 | RETOP | cloud top | 15 hour fcst | :RETOP:cloud top:15 hour fcst |
2 | 3 | 962034 | 1603774 | 962034-1603774 | 2021-05-04 09:00:00 | 2021-05-05 | var discipline=0 center=7 local_table=1 parmca... | entire atmosphere | 15 hour fcst | :var discipline=0 center=7 local_table=1 parmc... |
3 | 4 | 1603774 | 1910611 | 1603774-1910611 | 2021-05-04 09:00:00 | 2021-05-05 | VIL | entire atmosphere | 15 hour fcst | :VIL:entire atmosphere:15 hour fcst |
4 | 5 | 1910611 | 3239678 | 1910611-3239678 | 2021-05-04 09:00:00 | 2021-05-05 | VIS | surface | 15 hour fcst | :VIS:surface:15 hour fcst |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
168 | 169 | 151936733 | 151937231 | 151936733-151937231 | 2021-05-04 09:00:00 | 2021-05-05 | ICEC | surface | 15 hour fcst | :ICEC:surface:15 hour fcst |
169 | 170 | 151937231 | 153642912 | 151937231-153642912 | 2021-05-04 09:00:00 | 2021-05-05 | SBT123 | top of atmosphere | 15 hour fcst | :SBT123:top of atmosphere:15 hour fcst |
170 | 171 | 153642912 | 155314328 | 153642912-155314328 | 2021-05-04 09:00:00 | 2021-05-05 | SBT124 | top of atmosphere | 15 hour fcst | :SBT124:top of atmosphere:15 hour fcst |
171 | 172 | 155314328 | 156883538 | 155314328-156883538 | 2021-05-04 09:00:00 | 2021-05-05 | SBT113 | top of atmosphere | 15 hour fcst | :SBT113:top of atmosphere:15 hour fcst |
172 | 173 | 156883538 | 156883538- | 2021-05-04 09:00:00 | 2021-05-05 | SBT114 | top of atmosphere | 15 hour fcst | :SBT114:top of atmosphere:15 hour fcst |
173 rows Γ 10 columns
[21]:
# See what messages will be downloaded by a search string.
H.read_idx("(?:U|V)GRD:(?:10|80) m")
[21]:
grib_message | start_byte | end_byte | range | reference_time | valid_time | variable | level | forecast_time | search_this | |
---|---|---|---|---|---|---|---|---|---|---|
59 | 60 | 36906392 | 38106734 | 36906392-38106734 | 2021-05-04 09:00:00 | 2021-05-05 | UGRD | 80 m above ground | 15 hour fcst | :UGRD:80 m above ground:15 hour fcst |
60 | 61 | 38106734 | 39269979 | 38106734-39269979 | 2021-05-04 09:00:00 | 2021-05-05 | VGRD | 80 m above ground | 15 hour fcst | :VGRD:80 m above ground:15 hour fcst |
76 | 77 | 52662153 | 55043768 | 52662153-55043768 | 2021-05-04 09:00:00 | 2021-05-05 | UGRD | 10 m above ground | 15 hour fcst | :UGRD:10 m above ground:15 hour fcst |
77 | 78 | 55043768 | 57425383 | 55043768-57425383 | 2021-05-04 09:00:00 | 2021-05-05 | VGRD | 10 m above ground | 15 hour fcst | :VGRD:10 m above ground:15 hour fcst |
[22]:
# See what messages will be downloaded by a search string.
H.read_idx("(U|V)GRD:[8|5][0|5]0 mb")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:634: UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
logic = df.search_this.str.contains(searchString)
[22]:
grib_message | start_byte | end_byte | range | reference_time | valid_time | variable | level | forecast_time | search_this | |
---|---|---|---|---|---|---|---|---|---|---|
16 | 17 | 11102539 | 11718442 | 11102539-11718442 | 2021-05-04 09:00:00 | 2021-05-05 | UGRD | 500 mb | 15 hour fcst | :UGRD:500 mb:15 hour fcst |
17 | 18 | 11718442 | 12320354 | 11718442-12320354 | 2021-05-04 09:00:00 | 2021-05-05 | VGRD | 500 mb | 15 hour fcst | :VGRD:500 mb:15 hour fcst |
27 | 28 | 19477225 | 20120596 | 19477225-20120596 | 2021-05-04 09:00:00 | 2021-05-05 | UGRD | 850 mb | 15 hour fcst | :UGRD:850 mb:15 hour fcst |
28 | 29 | 20120596 | 20748701 | 20120596-20748701 | 2021-05-04 09:00:00 | 2021-05-05 | VGRD | 850 mb | 15 hour fcst | :VGRD:850 mb:15 hour fcst |
Hereβs another example: download all variables at 500 mb
[23]:
# Download a different Subset of File the local file
H = Herbie(valid_date="2022-3-5 12:00", fxx=0)
H.download(":500 mb:", verbose=True)
ππ»ββοΈ Found 2022-Mar-05 12:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
π¨π»βπ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20220305]
π Download subset: [HRRR] model [sfc] product run at 2022-Mar-05 12:00 UTC F00
cURL from https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220305/conus/hrrr.t12z.wrfsfcf00.grib2
14 :HGT:500 mb:anl
15 :TMP:500 mb:anl
16 :DPT:500 mb:anl
17 :UGRD:500 mb:anl
18 :VGRD:500 mb:anl
πΎ Saved the above subset to C:\Users\blayl_depgywe\data\hrrr\20220305\hrrr.t12z.wrfsfcf00.grib2.subset_8a05a62e3e874603b5d6b37737904e0ec526ce1a
Herbie creates a unique filename for the subsetted files when it is downloaded.
[24]:
# Show path to subset file. You should check if this path exists or not.
H.get_localFilePath(":500 mb")
[24]:
WindowsPath('C:/Users/blayl_depgywe/data/hrrr/20220305/hrrr.t12z.wrfsfcf00.grib2.subset_8a05a62e3e874603b5d6b37737904e0ec526ce1a')
Read GRIB2 file with xarray#
Herbie can read GRIB2 files with xarray via cfgrib. By default, if the file requested does not already exist on local disk, Herbie will delete the file after it is loaded into memory (if on Linux; removing file does not work on Windows.).
[29]:
# Read file with xarray that does not exists on disk
H = Herbie("2022-4-2 06:00", fxx=0)
Hx = H.xarray(":500 mb", verbose=True)
Hx
ππ»ββοΈ Found 2022-Apr-02 06:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
π¨π»βπ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20220402]
π Download subset: [HRRR] model [sfc] product run at 2022-Apr-02 06:00 UTC F00
cURL from https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220402/conus/hrrr.t06z.wrfsfcf00.grib2
14 :HGT:500 mb:anl
15 :TMP:500 mb:anl
16 :DPT:500 mb:anl
17 :UGRD:500 mb:anl
18 :VGRD:500 mb:anl
πΎ Saved the above subset to C:\Users\blayl_depgywe\data\hrrr\20220402\hrrr.t06z.wrfsfcf00.grib2.subset_8a05a62e3e874603b5d6b37737904e0ec526ce1a
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
warnings.warn("sorry, on windows I couldn't remove the file.")
[29]:
<xarray.Dataset> Dimensions: (y: 1059, x: 1799) Coordinates: time datetime64[ns] 2022-04-02T06:00:00 step timedelta64[ns] 00:00:00 isobaricInhPa float64 500.0 latitude (y, x) float64 21.14 21.15 21.15 ... 47.86 47.85 47.84 longitude (y, x) float64 237.3 237.3 237.3 ... 299.0 299.0 299.1 valid_time datetime64[ns] 2022-04-02T06:00:00 Dimensions without coordinates: y, x Data variables: t (y, x) float32 267.3 267.3 267.3 ... 250.7 250.8 250.8 u (y, x) float32 6.373 6.373 6.373 ... -2.002 -1.94 -1.94 v (y, x) float32 -4.079 -4.079 -4.079 ... 18.8 19.17 gh (y, x) float32 5.853e+03 5.853e+03 ... 5.308e+03 dpt (y, x) float32 244.5 244.4 244.2 ... 229.5 229.4 229.2 gribfile_projection object None Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: sfc description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: C:\Users\blayl_depgywe\data\hrrr\20220402\hrrr.t... searchString: :500 mb
[30]:
# Notice that the local grib subset file does not exists locally because it was removed
Hx.attrs["local_grib"].exists()
[30]:
True
[27]:
# You can tell xarray not to delete the grib2 file
H = Herbie("2021-5-6", fxx=0)
Hx = H.xarray(":500 mb", remove_grib=False)
Hx
ππ»ββοΈ Found 2021-May-06 00:00 UTC F00 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.
π¨π»βπ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20210506]
[27]:
<xarray.Dataset> Dimensions: (y: 1059, x: 1799) Coordinates: time datetime64[ns] 2021-05-06 step timedelta64[ns] 00:00:00 isobaricInhPa float64 500.0 latitude (y, x) float64 21.14 21.15 21.15 ... 47.86 47.85 47.84 longitude (y, x) float64 237.3 237.3 237.3 ... 299.0 299.0 299.1 valid_time datetime64[ns] 2021-05-06 Dimensions without coordinates: y, x Data variables: t (y, x) float32 ... u (y, x) float32 ... v (y, x) float32 ... gh (y, x) float32 ... dpt (y, x) float32 ... gribfile_projection object None Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: sfc description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: C:\Users\blayl_depgywe\data\hrrr\20210506\hrrr.t... searchString: :500 mb
[31]:
# The local grib does exists
Hx.attrs["local_grib"].exists()
[31]:
True
Working with multiple Herbie objects#
Use the fast Herbie functions when you want to work with multiple files at once. Fast Herbie uses multithreading to increase the speed of sequentially creating Herbie objects.
Creating a Herbie object is a lot of network traffic (Herbie check if a GRIB2 file exits at a lot of different remote archives and also looks for index files). Herbie also downloads the GRIB2 files, which is also mainly done over the network, and reading the data into xarray depends on the downloads.
Multithreading is useful for I/O bound tasks. As I understand, communication across the internet falls under this category. So multi threads can be helpful when creating many Herbie objects.
Creating lots of Herbie objects (
herbie.tools.fast_Herbie
)Downloading lots of files (
herbie.tools.fast_Herbie_download
)Loading lots of files into xarray (
herbie.tools.fast_Herbie_xarray
)
[32]:
from herbie.tools import fast_Herbie, fast_Herbie_download, fast_Herbie_xarray
import pandas as pd
[33]:
# Use pandas to create a list of Datetimes
DATES = pd.date_range("2022-01-01", periods=6, freq="1H")
# Create a list of forecast lead times
fxx = [0, 1, 2, 3, 4, 5, 6]
Fast Herbie#
Create many Herbie objects for all the dates and lead times requested.
[34]:
# Create many Herbie objects (for all dates and lead times requested)
HH = fast_Herbie(DATES=DATES, fxx=fxx)
len(HH), HH
[34]:
(42,
[[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F06])
Fast Herbie Download#
Download the files for a subset of many Herbie objects
[35]:
# Download those Herbie objects (subset by 2-m temperature)
a = fast_Herbie_download(DATES=DATES, fxx=fxx, searchString="TMP:2 m")
a
π¨π»βπ Created directory: [C:\Users\blayl_depgywe\data\hrrr\20220101]
[35]:
{'passed': [[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 00:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 01:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 02:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 03:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 04:00 UTC F06,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F00,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F01,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F02,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F03,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F04,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F05,
[HRRR] model [sfc] product run at 2022-Jan-01 05:00 UTC F06],
'failed': []}
Fast Herbie xarray#
Read the data into an xarray DataFrame. Notice that this concatenates all the files along the datetime (t) and lead time (f) dimensions.
NOTE: The searchString must return data on the same hyper cube (data must be on the same type of level; see cfgrib for more details). For instance, you shouldnβt load 2-m and 500 hPa data in the same object.
WARNING: Could run into memory limit if requesting too much data.
[36]:
# Read into xarray
ds = fast_Herbie_xarray(DATES=DATES[:3], fxx=[0, 6], searchString="(?:U|V)GRD:10 m")
ds
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
warnings.warn("sorry, on windows I couldn't remove the file.")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
warnings.warn("sorry, on windows I couldn't remove the file.")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
warnings.warn("sorry, on windows I couldn't remove the file.")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
warnings.warn("sorry, on windows I couldn't remove the file.")
C:\Users\blayl_depgywe\BB_python\Herbie\herbie\archive.py:950: UserWarning: sorry, on windows I couldn't remove the file.
warnings.warn("sorry, on windows I couldn't remove the file.")
[36]:
<xarray.Dataset> Dimensions: (t: 3, f: 2, y: 1059, x: 1799) Coordinates: time (t) datetime64[ns] 2022-01-01 ... 2022-01-01T02:00:00 step (f) timedelta64[ns] 00:00:00 06:00:00 heightAboveGround float64 10.0 latitude (y, x) float64 21.14 21.15 21.15 ... 47.86 47.85 47.84 longitude (y, x) float64 237.3 237.3 237.3 ... 299.0 299.0 299.1 valid_time (f, t) datetime64[ns] 2022-01-01 ... 2022-01-01T08:0... Dimensions without coordinates: t, f, y, x Data variables: u10 (f, t, y, x) float32 -3.392 -3.33 ... -1.313 -1.375 v10 (f, t, y, x) float32 -5.777 -5.777 ... 2.929 2.991 gribfile_projection (f, t) object None None None None None None
Plot with herbie
xarray custom accessor#
ποΈ WORK IN PROGRESS
This requires my Carpenter Workshop functions.
[24]:
Hx.herbie.plot()
cfgrib variable: t
GRIB_cfName air_temperature
GRIB_cfVarName t
GRIB_name Temperature
GRIB_units K
GRIB_typeOfLevel isobaricInhPa
/p/home/blaylock/anaconda3/envs/basic38/lib/python3.8/site-packages/cartopy/mpl/geoaxes.py:1702: UserWarning: The input coordinates to pcolormesh are interpreted as cell centers, but are not monotonically increasing or decreasing. This may lead to incorrectly calculated cell edges, in which case, please supply explicit cell edges to pcolormesh.
X, Y, C, shading = self._pcolorargs('pcolormesh', *args,
cfgrib variable: u
GRIB_cfName eastward_wind
GRIB_cfVarName u
GRIB_name U component of wind
GRIB_units m s**-1
GRIB_typeOfLevel isobaricInhPa
cfgrib variable: v
GRIB_cfName northward_wind
GRIB_cfVarName v
GRIB_name V component of wind
GRIB_units m s**-1
GRIB_typeOfLevel isobaricInhPa
cfgrib variable: gh
GRIB_cfName geopotential_height
GRIB_cfVarName gh
GRIB_name Geopotential Height
GRIB_units gpm
GRIB_typeOfLevel isobaricInhPa
cfgrib variable: dpt
GRIB_cfName unknown
GRIB_cfVarName dpt
GRIB_name Dew point temperature
GRIB_units K
GRIB_typeOfLevel isobaricInhPa
[24]:
<GeoAxesSubplot:title={'left':'Run: 00:00 UTC 06 May 2021 F00','center':'HRRR 500 hPa','right':'Valid: 00:00 UTC 06 May 2021'}>




