Bowen-Hoberg-Fresard Patent-Text Data

About These Measures

The Bowen, Hoberg, and Fresard measures provided here are based on the text in the section of patents describing the innovation.

They provide researchers a new way to characterize innovation within public firms, startups, places and more. Importantly, they are distinct from existing measures and do not have look-ahead bias: they only use information available in the patent itself.

This Data is Provided by

Donald Bowen (Lehigh University),
Laurent Fresard (Universita della Svizzera italiana),
and Gerard Hoberg (University of Southern California)

and developed within “Rapidly Evolving Technologies and Startup Exits” which is forthcoming in Management Science. Please cite that study when using or referring to any data or code in this repository.


“RETech” measures whether the patent pertains to a technological area that is rapidly evolving (i.e., following breakthroughs) or stable. Higher levels of our measure detects patents in new areas and those in subsequent waves of development. High RETech patents substitute for existing technologies rather than complement them, receive more citations and get higher stock market reactions. Among measures without look-ahead bias, RETech has the strongest association with notable breakthrough patents (like lasers, DNA modifications, satellites, Google’s PageRank, and more).


Tech Breadth

“Tech Breadth” measures how much (or little) the patent’s text is spread across technological fields. Patents with low levels of breadth (i.e. 0) are niche and can be understood by scientists familiar with a single field of study. High values of breadth indicate that the patent imbues ideas from many fields and will likely require teams with diverse knowledge to implement. As such, we expect low breadth patents to be more redeployable and complementary to the technology stacks outside the inventing firm.

Usage Notes

1. The dataset contains raw values (i.e. not winsorized) and we recommend winsorizing by application year before using them.
2. We generally recommend using the data above by application year to match the timing of the innovation best.
3. You can use this Stata function to convert patent-level variables into group-time variables (e.g. firm-year, state-year, MSA-quarter).

Please see the paper for details on the construction of the measures. Questions can be directed to Donald Bowen, and pointers to errors or omissions, and corrections are welcome.

Power Users Only

A GitHub repo contains our code library that downloads Google patent pages, parses and cleans the text, and creates variables. Users interested in creating measures from patent text or modifying ours should go there.