Aggregates That Lie: A Framework Audit of CPI, GDP, and the 2% Target

Aggregates That Lie: A Framework Audit of CPI, GDP, and the 2% Target

Jason D. Keys·
SeriesNew Austrian Economics — Watching the Cracks· 5 of 12
CPIGDPinflationFederal ReserveBoskin CommissionMengerFeketemonetary theorymeasurementsaleability

Aggregates That Lie: A Framework Audit of CPI, GDP, and the 2% Target

The April 2026 Consumer Price Index reading, released by the Bureau of Labor Statistics on May 12, came in at 3.8% year-over-year, with core CPI at 2.8%. The Federal Reserve continues to describe 2% as the price-stability target. The Q1 2026 GDP release described the U.S. economy as having grown at a 2.3% annualized real rate. These three numbers — the headline inflation rate, the policy target, and the growth figure — together constitute the dominant macroeconomic vocabulary in which American economic discussion is conducted, by every major news outlet, every congressional debate, every Federal Reserve meeting, every long-term household financial plan.

The framework's reading is that all three numbers are structurally incapable of measuring what they claim to measure. Not because of bad faith. Not because of methodological sloppiness — the Bureau of Labor Statistics employs some of the most careful statisticians in the world and the methodology documentation runs to thousands of pages. Not even because the numbers are wrong on their own terms, since each aggregate is internally consistent with its own definitional choices. The numbers fail because the conceptual approach of summarizing a heterogeneous, dynamic, household-experience-dependent phenomenon into a single national scalar is wrong for the questions the readings are being used to answer.

This essay is the fifth installment of the Watching the Cracks series. Where the prior installments examined specific empirical sub-systems — banking failures, metro housing markets, the Florida insurance crisis — this one engages the broader question of whether the aggregate measurement framework itself is fit for purpose in 2026. The answer the framework reaches is direct: it is not, the failures are structural rather than transient, and the alternative measurement approaches the New Austrian framework has been developing across this catalog now constitute a coherent program that produces meaningfully different diagnostic readings than the official aggregates.

The essay proceeds in five parts. First, the 2% target as a monetarist artifact and what its arithmetic actually implies. Second, the methodology shifts since 1996 that have systematically lowered measured CPI. Third, the structural reasons no national aggregate can capture what households actually experience. Fourth, the existing critique literature (Boskin Commission retrospective, ShadowStats, MIT Billion Prices Project, Truflation) and what it gets right and wrong. Fifth, the framework's alternative measurement program, drawing on the metro saleability map, the Mengerian Stress Index, and the diagnostic apparatus assembled across the prior eighteen essays of this catalog.

The 2% target and what Rule of 72 actually means

The Federal Reserve formalized its 2% inflation objective on January 25, 2012, when the Federal Open Market Committee published its first explicit longer-run goals statement. The target had been operational for several years before that — Ben Bernanke, then Fed chair, was widely understood to be operating to a 2% objective from approximately 2003 forward — but 2012 marked its formal entry into the Fed's published framework.

The 2% target was not a Federal Reserve original. It was imported from the Reserve Bank of New Zealand, which in its 1990 Policy Targets Agreement with the New Zealand government committed to maintaining inflation in a 0-2% band. The RBNZ approach, championed by then-Governor Don Brash, was a direct application of monetarist macroeconomic theory — the view that central bank policy should focus narrowly on a measurable price-stability target, allowing markets to operate freely around that anchor. The framework was picked up by the Bank of England under inflation targeting in 1992, by Canada in the early 1990s, by Sweden in 1993, and eventually by most developed-economy central banks through the 2000s.

The intellectual lineage is therefore Keynesian-monetarist synthesis, not Austrian. Menger's framework treats money as the most saleable commodity in a market — an asset whose properties emerge from market participants' choices and whose relative value is determined by the same forces that determine any other good's exchange value. There is no role in Menger's framework for a central authority to set a target for the rate of money's purchasing power decay. Fekete, working in the Menger tradition, was explicit that the post-1971 fiat monetary system's tendency toward continuous purchasing-power erosion was the central economic pathology of the modern period, not a policy parameter to be calibrated.

The framework's specific objection to the 2% target is mathematical rather than ideological. The Rule of 72 — a banker's heuristic for compound growth — states that money doubles at a rate equal to 72 divided by the annual interest rate, and conversely that purchasing power halves at the same rate when the rate is treated as inflation. At 2% inflation, purchasing power halves in 36 years. Over a 75-year span — the time horizon of a household working from age 25 to retirement at 67, then living to age 100 — the dollar loses approximately 78% of its purchasing power as the explicit central bank policy objective. A worker in 2026 earning $60,000 per year, contributing to a retirement account expected to be drawn down beginning in 2068, is being told by the central bank's own published target that the purchasing power of those savings will be reduced by approximately three-quarters across the retirement period, by design.

This is not a small policy parameter. It is the explicit acceptance of a particular trajectory of monetary depreciation, set by an institution whose original Federal Reserve Act mandate was to "maintain long-run growth of monetary and credit aggregates commensurate with the economy's long-run potential to increase production, so as to promote effectively the goals of maximum employment, stable prices, and moderate long-term interest rates." The phrase "stable prices" appears in the statute. The framework's reading is that 2% annual depreciation, sustained across multi-decade horizons, is not stability in any sense Menger or Fekete would have recognized. It is policy-mandated capital erosion at a specifiable rate.

The 2% target is also, in framework terms, the wrong objective even if it could be achieved. Menger's saleability framework asks what properties make a monetary good more or less useful for exchange. Continuous depreciation impairs the savings function (one of the four classical functions of money), erodes the unit-of-account function over multi-decade contracts, and operates as a Fekete-an extraction on every monetary balance held longer than the period of high-cost rebalancing. A central bank pursuing 2% depreciation as a target is, by the framework's standards, deliberately impairing one of the four functions of money in pursuit of objectives (employment, output, financial stability) that the same framework treats as structurally indeterminate at the aggregate level.

The two-percent target is therefore not, in the framework's reading, a good target that happens to be missed in practice. It is a bad target in its own terms, mathematically guaranteed to halve currency purchasing power within ordinary career horizons, anchored in monetarist macroeconomics rather than Mengerian saleability, and incoherent with the statutory language ("stable prices") under which the Federal Reserve is supposed to operate.

The methodology shifts since 1996

If the 2% target is the wrong objective, the next question is whether the measurement the target is calibrated against is meaningful. The framework's reading is that it is not — and the reasons are operationally specific.

In December 1996, a five-member commission appointed by the U.S. Senate Finance Committee, chaired by Stanford economist Michael Boskin, published its final report under the title Toward A More Accurate Measure Of The Cost Of Living. The commission concluded that the official CPI was overstating actual cost-of-living change by approximately 1.1 percentage points per year in 1996 and roughly 1.3 percentage points per year in earlier periods. The commission identified four specific sources of overstatement: substitution bias (consumers respond to price changes by substituting cheaper alternatives, but the CPI assumed a fixed basket), outlet bias (consumers shift toward discount retailers, but the CPI sampled established outlets), quality bias (improvements in products were treated as price increases rather than quality enhancements), and new-product bias (innovative products entered the index too slowly to capture their early-period welfare effects).

The Boskin recommendations were politically consequential because the CPI was — and remains — the indexing basis for Social Security cost-of-living adjustments, federal tax bracket adjustments, federal pension benefits, and approximately 4.4% of federal spending overall. A 1-percentage-point reduction in measured CPI translates to roughly $100 billion in reduced federal indexed spending over a decade. Congress did not formally adopt the Boskin recommendations, but the Bureau of Labor Statistics implemented substantially all of them through methodology changes that began in 1995 and accelerated through 1999.

The most consequential single change was the adoption of geometric mean weighting for the lower level of the index (the level at which prices for specific goods within categories are combined). Before 1999, the BLS used an arithmetic Laspeyres formula, which assumed consumers would continue to buy the same quantities of each good as prices changed. After 1999, the BLS used a geometric Sato-Vartia or modified Laspeyres-Paasche formula, which implicitly assumed consumers would substitute toward cheaper goods within a category as relative prices changed. The geometric formulation produces lower measured inflation when prices within a category diverge — which is most of the time, for most categories. The BLS's own analysis estimated this single methodology shift reduced measured CPI by approximately 0.27 percentage points per year through the late 1990s. Subsequent retrospective work by Robert J. Gordon (NBER Working Paper 12311, 2006) suggested the cumulative effect across all the post-Boskin changes had reduced measured CPI by approximately 0.8 percentage points per year by the mid-2000s.

The second consequential change was the expansion of hedonic adjustments for product quality. The BLS now uses hedonic regressions to estimate the value households assign to quality improvements in computers, televisions, apparel, rental housing, refrigerators, dishwashers, and several other categories. The mechanism: when a new computer at the same price replaces an older model with less RAM and a slower processor, the hedonic adjustment treats some portion of the price as a decrease (because the new computer is "better," even though the household paid the same dollar amount). The framework reads this as conceptually distinct from cost-of-living measurement. A household trying to maintain a standard of computing capability relative to current norms (because work, communication, and basic functioning require current-spec equipment) cannot benefit from hedonic improvements that the index treats as price decreases. The household experiences the new equipment at the same dollar cost; the index reports a hedonic-adjusted price decline.

The third consequential change was the treatment of housing through Owners' Equivalent Rent (OER). Before 1983, the CPI measured housing costs through a direct asset-price approach — the cost of buying and financing a home. In 1983, the BLS shifted to OER, which measures the rental value of owner-occupied housing through surveys of rental properties of similar type. The methodological argument for the shift is defensible (purchasing a home is partly an investment, and the consumption value of housing services is more accurately captured by rental equivalence). The empirical consequence has been that OER produces meaningfully lower inflation readings than direct house-price-and-financing-cost measurement during periods when home prices and mortgage rates rise faster than rents — which is essentially the entire post-1996 period. The Case-Shiller home price index rose approximately 240% between 1996 and 2024; the OER component of CPI rose approximately 95% over the same period. Households purchasing homes experienced something closer to the Case-Shiller trajectory; the index reflected something closer to the OER trajectory.

The cumulative effect of these and several other methodology shifts is that the CPI as published in 2026 is calculated under substantially different methodology than the CPI as published in 1995. Cross-period comparisons of inflation rates are not strictly comparable. The post-Boskin methodology produces readings approximately 1.0-1.3 percentage points lower than the pre-Boskin methodology would have produced for the same underlying price data. Across 30 years, this gap compounds to approximately 35-50% in cumulative reported inflation versus what the pre-Boskin methodology would have shown.

The framework's reading is not that the BLS is engaged in fraud or that the methodology changes are illegitimate on their own technical terms. Each individual change is defensible within its own analytical frame. The framework's observation is structural: the methodology choices were made by an institution whose statistical output was politically and fiscally consequential, in directions that systematically reduced measured inflation, with each change adopted through internal technical processes that did not receive the kind of public scrutiny that a 30-year cumulative reduction of one-third in reported inflation would have warranted if presented as a single policy decision. The framework does not need to allege bad faith. The structural pressure on the measurement institution to produce lower readings — through reasonable-sounding individual choices each justified on its own terms — is sufficient to explain the trajectory.

Why no national aggregate can capture household experience

Even if the CPI methodology were optimal in every individual respect, the framework's deeper objection would remain: no single national aggregate can capture what households actually experience as the cost of living, because the household-level experience is structurally heterogeneous in ways no aggregate can resolve.

The metro saleability map developed in Article 17 of this catalog made this concrete at the geographic level. Forty major U.S. metros, evaluated on four observable indicators, produced sixteen metros in clear stress, fourteen in stable condition, and ten transitional. The cumulative 10-year carrying-cost gap between Lakeland and Columbus, on identical home prices, was approximately $90,000 — a difference larger than the median annual household income in either metro. A single national CPI reading cannot describe both of these households' experiences simultaneously, because they are experiencing structurally different inflation trajectories driven by structurally different cost components.

The geographic decomposition is one dimension. The framework can identify at least five others where the same aggregation problem operates:

Income-quintile heterogeneity. Households in the bottom income quintile spend approximately 36% of income on food and 35% on housing (rent or mortgage plus utilities), per BLS Consumer Expenditure Survey data. Households in the top income quintile spend approximately 11% on food and 27% on housing, with much larger shares allocated to transportation, healthcare, entertainment, and savings. The same price changes produce structurally different cost-of-living impacts across income quintiles, because the basket weights differ by orders of magnitude. The CPI publishes one number that purports to summarize both.

Age-cohort heterogeneity. A 70-year-old household spends approximately 14% of income on healthcare; a 30-year-old household spends approximately 5%. Healthcare prices have risen approximately twice as fast as the overall CPI across the post-2000 period (medical care services component up approximately 92% from 2000-2024 vs. all-items CPI up approximately 75%). The framework's reading: the 70-year-old household has experienced meaningfully higher actual inflation than the 30-year-old household for two decades, and the single aggregate cannot describe either accurately.

Life-stage heterogeneity. Households with school-age children spend approximately 8% of income on child-related expenses (childcare, K-12 educational supplements, age-specific consumption) that essentially do not appear in childless household budgets. Childcare prices have risen approximately 145% from 2000-2024, well above headline CPI. Education prices have risen approximately 175% over the same period. A household in the active child-rearing phase faces a fundamentally different inflation environment than the same household ten years before or ten years after.

Consumption-mix heterogeneity. Households differ in their consumption preferences in ways the BLS attempts to capture through expenditure surveys but cannot resolve at the individual level. A household that consumes primarily restaurant meals has experienced different food-cost inflation than one that consumes primarily home-cooked meals. A household that drives a vehicle 25,000 miles per year experiences different transportation-cost inflation than one that drives 8,000 miles per year. The aggregate flattens these distinctions to a national average.

Time-horizon heterogeneity. The CPI is a Laspeyres-style backward-looking index applied to a basket whose weights reflect past consumption. By the time the basket is updated to reflect new consumption patterns, the patterns have already shifted. The household making a 30-year housing decision needs forward-looking information about the structural saleability and carrying-cost trajectory of their candidate metros; the CPI cannot provide this because it is, by construction, looking at where prices have been rather than where the underlying cost-structure is moving.

The Phoenix wage-and-price episodes from the early 2000s are a useful illustration. Mutual fund company employee meetings in Arizona during that period — the housing run-up that preceded the 2008 crash — produced specific household complaints that wage increases were not keeping pace with the cost of a home. The complaints were dismissed at the time as anecdotal. The CPI's OER methodology was correctly producing low housing-cost readings (rental equivalence in metros with high homeownership rates lags the asset-price trajectory). The framework's reading is that the households were empirically correct and the aggregate was failing to capture the relevant phenomenon — the asset-price inflation that determines household balance-sheet outcomes was running far above the rental-flow measure that the CPI captures. The aggregate gave a reading that was internally consistent and analytically defensible while completely missing the experience the households were articulating.

This is not a methodology problem that can be fixed by better hedonic adjustments or more frequent basket rotation. It is the structural limitation of single-number aggregates applied to phenomena that operate heterogeneously across geography, income, age, life stage, consumption mix, and time horizon. The framework's reading is that the project of producing a single "the inflation rate" is conceptually mistaken, regardless of how carefully the underlying statistics are assembled.

The existing critique literature, examined

Three external critique frameworks deserve specific engagement.

ShadowStats, the long-running website maintained by John Williams since 2004, publishes "alternative" CPI calculations purported to reflect 1980s and 1990s methodology. The ShadowStats readings consistently run 5-7 percentage points higher than official CPI. The framework's reading is that ShadowStats is directionally correct but methodologically broken. Directionally correct because the methodology shifts since 1996 have indeed produced lower readings than the pre-Boskin methodology would have produced. Methodologically broken because Williams has acknowledged, in a phone call with economist James Hamilton, that ShadowStats does not actually recalculate the underlying data — it simply adds a constant offset to the official CPI and publishes the sum. This is not a recalculation; it is a transformation. The constant offset, originally derived from the Boskin Commission's own estimate of measurement bias, has been applied for two decades without re-estimation, producing readings that drift further from defensibility each year. The framework can engage the critique that ShadowStats articulates (methodology shifts have lowered measured inflation) without endorsing the numbers ShadowStats publishes (which are not analytically supported).

The MIT Billion Prices Project, founded by Alberto Cavallo and Roberto Rigobon in 2008, scrapes online retail prices for hundreds of thousands of products across multiple countries and publishes real-time daily inflation indexes. The methodology is genuinely novel — it does not depend on BLS survey sampling, does not require basket updates, captures online prices that approximate the actual consumption experience of modern households, and operates at daily resolution rather than monthly. The framework treats the BPP as the most credible single alternative inflation measure currently in operation. Its limitations: it does not cover services well (services prices are not typically posted online), it does not cover housing (rent prices are not consistently online-scrapable), it does not cover the full basket of household expenditure. The BPP and the CPI tend to agree on average over long periods but diverge meaningfully during acute price-change episodes (the early COVID period, the 2022-2023 inflation surge), with the BPP typically showing the change earlier and more sharply.

Truflation, a daily inflation index launched in 2022, takes a similar real-time data-scraping approach to the BPP but covers a wider expenditure basket and publishes a publicly accessible composite daily reading. The Truflation methodology is documented and the index is verifiable; the readings tend to track BPP closely and to diverge from official CPI during the same acute episodes that distinguish BPP from CPI. The framework treats Truflation as a useful additional reference point, particularly for users who want a single daily reading rather than the multiple-series BPP output.

What none of these alternative measures provides is what the framework actually needs: inflation measurement that is forward-looking, geographically disaggregated, and household-specific in a way that the user can apply to their own situation. The BPP and Truflation produce better aggregates than the CPI in some respects, but they remain aggregates. The framework's diagnostic apparatus from the rest of this catalog points toward a different kind of measurement framework — one that abandons the single-number ambition and produces instead a set of geographically-specific, asset-class-specific, household-applicable saleability readings.

The framework's alternative measurement program

The framework has, across the prior nineteen essays of this catalog, produced or proposed five distinct measurement components that together constitute an alternative monetary diagnostic program. Drawing them together explicitly:

Component 1: The Mengerian Stress Index (MSI), proposed in Article 12. The MSI is a composite of five sub-indicators (paper-physical premium in precious metals, on-the-run / off-the-run Treasury spread, repo haircut dispersion, FX cross-currency basis, ETF-NAV deviation in stress) designed to measure substrate-layer monetary stress in real time. The current 2026 reading is approximately 1.15, against a pre-2022 baseline of approximately 0.25. The MSI is not a price index; it is a stress index. It captures how close the monetary substrate is to acute dysfunction, with the structural drift from 0.25 to 1.15 across four years constituting the framework's quantification of the secular saleability decay that no official aggregate reports.

Component 2: The metro saleability map, developed in Article 17. Forty major U.S. metros scored on four observable indicators (inventory imbalance, year-over-year price trajectory, foreclosure rates, property tax burden) and color-coded into three saleability tiers. The map is updated quarterly tied to ATTOM, Zillow, and Tax Foundation release cycles. It is not a single inflation reading; it is a geographic diagnosis of where housing's underlying low saleability is currently producing visible household-level stress. Reading it informs household decisions in ways no national aggregate can.

Component 3: The tax-plus-insurance wedge measurement, developed in Article 19. A direct numerical computation of the non-mortgage carrying-cost component of household housing cost, broken down across property tax, homeowners insurance, and HOA fees, with assessment-cap regime as an additional explanatory variable. The framework's specific calculation: a 10-year carrying-cost gap of approximately $90,000 between Lakeland, Florida and Columbus, Ohio on identical $400,000 home prices. The wedge is a household-applicable measurement framework. Any household can apply it to their candidate metro using publicly available data.

Component 4: Real-time price scraping (MIT BPP / Truflation integration). The framework can incorporate the BPP daily indexes and Truflation real-time data as supplementary signals, particularly for the goods component of household expenditure that the metro-saleability work does not cover. These external indexes are themselves alternative measurement frameworks; the framework's contribution is to integrate them with the geographic and substrate-stress work into a coherent practitioner-grade dashboard.

Component 5: Household-specific saleability assessment. The framework can be applied at the individual household level through five questions the household can answer for themselves: What is my carrying-cost wedge in my specific metro? What is my position in the metro saleability map (red, yellow, green)? What is the current MSI reading and trajectory? What is my income-quintile, age-cohort, and life-stage exposure to inflation components that the aggregate is missing? What is my household's 5-10 year saleability trajectory under current monetary policy assumptions? No official aggregate answers any of these questions. The framework provides a workable answer to each.

These five components together do not produce the inflation reading. They produce a diagnostic system — a set of measurements that, taken together, give the household and the analyst a substantially more accurate picture of the relevant monetary phenomenon than the official aggregates provide.

The framework's specific commitment, as Series Four continues to develop: the MSI dashboard implementation will move from specification to working reference implementation across 2026-2027. The metro saleability map will be updated quarterly. The carrying-cost wedge calculator will be released as a household-applicable tool. The component readings will be integrated into a single publicly accessible dashboard hosted at newaustrianeconomics.com, with full source code available for inspection. The framework will, in operational form, do what ShadowStats has been promising for two decades and failing to actually deliver: produce a rigorous, methodologically defensible, household-applicable alternative monetary diagnostic system.

The closing observation

The April 2026 CPI reading of 3.8% is not, in the framework's vocabulary, wrong. It is structurally limited. It captures one specific aggregation of one specific basket through one specific set of methodology choices, and the reading is internally consistent with those choices. The reading also bears an increasingly tenuous relationship to what individual households are actually experiencing, what specific asset classes are actually doing, what regional housing markets are actually producing, and what the monetary substrate is actually experiencing in the way of accumulated stress.

The framework's job is to make this gap visible. Conventional economic discussion treats the CPI as approximately the right measure of inflation, the 2% target as approximately the right policy objective, and the GDP figure as approximately the right summary of economic output. The framework's reading is that all three are approximately aligned with reality in ways that allow the institutions producing them to continue producing them without acute political consequence, while diverging from the lived experience of households in ways that compound steadily over time.

The 2% inflation target is policy-mandated capital erosion at a specifiable rate, anchored in monetarist macroeconomics rather than Mengerian saleability, mathematically guaranteed to halve currency purchasing power within ordinary career horizons. The CPI has been altered through methodology shifts since 1996 that systematically reduce measured inflation by approximately 1.0-1.3 percentage points per year relative to the pre-Boskin methodology. The household-level experience of inflation diverges from the aggregate in structurally irreducible ways tied to geography, income, age, life stage, consumption mix, and time horizon. And the framework, after twenty essays' worth of accumulated diagnostic work, now has the apparatus to produce alternative measurements that are practitioner-grade, household-applicable, and methodologically defensible in ways the existing alternative-inflation literature has not achieved.

The aggregates will continue to be reported. The Federal Reserve will continue to describe 2% as the price-stability target. The April-by-April CPI releases will continue to dominate financial-press coverage of monetary conditions. The framework's contribution is to make visible what those readings cannot capture — and to provide the household, the analyst, and the practitioner with the diagnostic apparatus needed to navigate monetary reality on terms more accurate than the aggregates allow.

The next installment of Watching the Cracks will engage the FDIC Q1 2026 Quarterly Banking Profile, due for release in late May. The framework's predictions from Article 16 are now overdue for testing. The watching continues. The measurement program is now coherent enough to publish in its own right.


This is the fifth installment of "Watching the Cracks." A companion essay, "The Operational Substitute Layer: A Firsthand Account from Inside the Machinery," launches a separate thread (provisionally titled "Inside the Substitute Layer") that draws on direct institutional experience rather than external data, beginning with the author's work building one of the major New Zealand banks' RMBS infrastructure during the 2008-2013 period.

Related essays

Housing as Anti-Money: A Menger-Fekete Audit of the American Mortgage in 2026

The asset class with the worst Mengerian saleability characteristics on earth has been culturally positioned as the central wealth-building instrument of American life. Audited rigorously through the New Austrian framework, the modern American home is closer to anti-money than to money, and the mortgage that funds it is a 90-year experiment in inducing households to behave as miniature bond issuers in a perpetually inflating currency.

The Saleability Audit of Bitcoin: What Menger Would Say in 2026

Bitcoin maximalists insist Bitcoin is the most saleable monetary good ever created. Skeptics insist it doesn't work for the African villager or the rural Chinese citizen the maximalists invoke. Both positions miss what Menger's framework actually says when applied carefully. The audit produces uncomfortable results in both directions — Bitcoin scores remarkably well on some criteria and remarkably poorly on others — and the actual ground-truth of crypto adoption in emerging markets in 2026 is something neither camp accurately describes.

The Decay Function of Marketability: Toward a Computable Menger-Fekete Framework

Menger argued that saleability is a spectrum; Fekete developed the gold basis to measure it for one commodity. This essay proposes a generalizable decay function of marketability, measurable across every modern financial instrument, that renders Menger's core insight computable for the first time.