Bowen-Hoberg-Fresard Patent-Text Data

About These Measures

The Bowen, Hoberg, and Fresard measures provided here are based on the text in the section of patents describing the innovation.

They provide researchers a new way to characterize innovation within public firms, startups, places and more. Importantly, they are distinct from existing measures and do not have look-ahead bias: they only use information available in the patent itself.

This Data is Provided by

Donald Bowen (Lehigh University),
Laurent Fresard (Universita della Svizzera italiana),
and Gerard Hoberg (University of Southern California)

and developed within “Rapidly Evolving Technologies and Startup Exits” which is forthcoming in Management Science. Please cite that study when using or referring to any data or code in this repository.


“RETech” measures whether the patent pertains to a technological area that is rapidly evolving (i.e., following breakthroughs) or stable.

Higher levels of our measure detects patents in new areas and those in subsequent waves of development. High RETech patents substitute for existing technologies rather than complement them, receive more citations and get higher stock market reactions.

Among measures without look-ahead bias, RETech has the strongest association with notable breakthrough patents (like lasers, DNA modifications, satellites, Google’s PageRank, and more).

Top 20 Patents by RETech

Among patents applied for from 2009-2018, these are the top 20 patents by RETech.

Patent RETech NBER Cat App Year Grant Year Title
8268964 93 Chemicals 2009 2012 MHC peptide complexes and uses thereof in infectious diseases
10336808 93 Drugs & Medicine 2012 2019 MHC peptide complexes and uses thereof in infectious diseases
10633715 91 Drugs & Medicine 2016 2020 Gene controlling shell phenotype in palm
9481889 91 Drugs & Medicine 2013 2016 Gene controlling shell phenotype in palm
9067987 91 Drugs & Medicine 2012 2015 Neisserial antigenic peptides
8394390 91 Drugs & Medicine 2012 2013 Neisserial antigenic peptides
11047011 90 Drugs & Medicine 2016 2021 Immunorepertoire normality assessment method and its use
8624015 90 Chemicals 2012 2014 Probe set and method for identifying HLA allele
10607717 89 Comps & Commun 2014 2020 Method for subtyping lymphoma types by means of expression profiling
11028394 89 Drugs & Medicine 2016 2021 CRISPR/CAS-related methods and compositions for treating cystic fibrosis
10405749 88 Drugs & Medicine 2015 2019 RNA agents for P21 gene modulation
9765315 87 Drugs & Medicine 2013 2017 Cellulose and/or hemicelluloses degrading enzymes from Macrophomina phaseolina and uses thereof
10655102 87 Drugs & Medicine 2014 2020 Identification and isolation of human corneal endothelial cells (HCECS)
8110199 87 Drugs & Medicine 2009 2012 Streptococcus pneumoniae proteins and nucleic acid molecules
10731174 86 Drugs & Medicine 2018 2020 Plants showing a reduced wound-induced surface discoloration
10835585 86 Drugs & Medicine 2016 2020 Shared neoantigens
9700502 84 Drugs & Medicine 2010 2017 Methods for generating new hair follicles, treating baldness, and hair removal
9642789 84 Drugs & Medicine 2010 2017 Methods for generating new hair follicles, treating baldness, and hair removal
10570457 84 Drugs & Medicine 2015 2020 Methods for predicting drug responsiveness
10954517 83 Drugs & Medicine 2018 2021 Methods and compositions for the specific inhibition of complement component 5(C5) by double-stranded RNA

Tech Breadth

“Tech Breadth” measures how much (or little) the patent’s text is spread across technological fields. Patents with low levels of breadth (i.e. 0) are niche and can be understood by scientists familiar with a single field of study. High values of breadth indicate that the patent imbues ideas from many fields and will likely require teams with diverse knowledge to implement. As such, we expect low breadth patents to be more redeployable and complementary to the technology stacks outside the inventing firm.

Usage Notes

1. The dataset contains raw values (i.e. not winsorized) and we recommend winsorizing by application year before using them.
2. We generally recommend using the data above by application year to match the timing of the innovation best.
3. You can use this Stata function to convert patent-level variables into group-time variables (e.g. firm-year, state-year, MSA-quarter).

Please see the paper for details on the construction of the measures. Questions can be directed to Donald Bowen, and pointers to errors or omissions, and corrections are welcome.

Power Users Only

A GitHub repo contains our code library that downloads Google patent pages, parses and cleans the text, and creates variables. Users interested in creating measures from patent text or modifying ours should go there.