Groq is headquartered in
Mountain View, CA, and has offices in San Jose, CA, Liberty Lake, WA, Toronto, Canada, London, U.K. and remote employees throughout North America and Europe.
History
Groq was founded in 2016 by a group of former
Google engineers, led by Jonathan Ross, one of the designers of the
Tensor Processing Unit (TPU), an AI accelerator ASIC, and Douglas Wightman, an entrepreneur and former engineer at Google X (known as
X Development).[8]
Groq received seed funding from
Social Capital's
Chamath Palihapitiya, with a $10 million investment in 2017[9] and soon after secured additional funding.
On March 1st, 2022, Groq acquired Maxeler Technologies, a company known for its dataflow systems technologies.[14]
On August 16th, 2023, Groq selected Samsung Electronics foundry in
Taylor, Texas to manufacture its next generation chips, on Samsung's 4-nanometer (nm) process node. This was the first order at this new Samsung chip factory.[15]
On February 19th, 2024, Groq soft launched a developer platform, GroqCloud, to attract developers into using the Groq API.[16] On March 1st, 2024 Groq acquired Definitive Intelligence, a startup known for offering a range of business-oriented AI solutions, to help with its cloud platform.[17]
Technology
A die photo of Groq’s LPU V1
Groq's initial name for their ASIC was the Tensor Streaming Processor (TSP), but later rebranded the TSP as the Language Processing Unit (LPU).[1][18][19]
The LPU features a functionally sliced
microarchitecture, where memory units are interleaved with vector and matrix computation units.[20][21] This design facilitates the exploitation of
dataflow locality in AI compute graphs, improving execution performance and efficiency. The LPU was designed off of two key observations:
AI workloads exhibit substantial data parallelism, which can be mapped onto purpose built hardware, leading to performance gains.[20][21]
In addition to its functionally sliced microarchitecture, the LPU can also be characterized by its single core, deterministic architecture.[20][22] The LPU is able to achieve deterministic execution by avoiding the use of traditional reactive hardware components (
branch predictors,
arbiters,
reordering buffers,
caches)[20] and by having all execution explicitly controlled by the
compiler thereby guaranteeing determinism in execution of an LPU program.[21]
The first generation of the LPU (LPU v1) yields a computational density of more than 1TeraOp/s per square mm of silicon for its 25×29 mm 14nm chip operating at a nominal clock frequency of 900 MHz.[20] The second generation of the LPU (LPU v2) will be manufactured on Samsung's 4nm process node.[15]
Performance
Groq emerged as the first API provider to break the 100 tokens per second generation rate while running Meta’s
Llama2-70B parameter model.[23]
Groq currently hosts a variety of open-source large language models running on its LPUs for public access.[24] Access to these demos are available through Groq's website. The LPU's performance while running these open source LLMs has been independently benchmarked by ArtificialAnalysis.ai, in comparison with other LLM providers.[25] The LPU's measured performance is shown in the table below: