We caught up with Richard Zhang, Augmenta's VP of AI and R&D, after a few weeks on the road presenting at 3DV and CDFAM — two conferences where the hardest problems in spatial AI and computational design are being worked out. Here's what he brought back.
3DV and CDFAM are very different conferences, with the former an academic gathering and the latter an industry-focused event. I gave my 3DV talk at the Area Chair Workshop, which hosted leading computer vision researchers. Later, I was also part of a panel discussion on Foundation Models involving thought leaders in the field. At CDFAM, which was single-track, the audience were predominantly from the industry, with many start-up companies present. Most conversations surrounding 3D Generative AI focused on the data challenge, surrogate models, finding the right data model or representations, as well as the contrast between generic and bespoke Foundation Models. Interestingly, the question of "Is 3D even needed?" kept coming up.
To me, the most obvious contrast is that most companies are fully content to be mere consumers of the latest and shiniest AI models. Obviously, they are not under the burden of advancing the state of the art. The research community is dutifully taking on the challenge of identifying the weaknesses of the existing models and finding innovative ways to improve them. A leading thesis is how contemporary large models are lacking basic spatial and physical precision or intelligence to solve real problems in the physical world, and that this is an issue scaling alone will not resolve. There has been much talk about developing bespoke (i.e., specialist) models.
I think the construction industry is at a crossroad. By scale, construction dwarfs the aerospace and automotive sectors, even when combined, and still outsizes manufacturing, yet it remains the laggard in AI adoption. Our industry also faces a unique challenge: the cost of design automation does not amortize over high volumes. As my CEO, Francesco Iorio, puts it: you can sell a million copies of an iPhone design, but no two buildings are ever meant to be identical. Complexities such as this should not be a deterrent to AI adoption, but a catalyst. AI must accelerate in construction to unlock productivity, sustainability, and waste reduction. All signals I can gather suggest that 2026 will represent an inflection point — the year we shift from merely consuming AI to fundamentally innovating with it.
Honestly, I did not come to Augmenta with a full appreciation of the difficulty and relevance of the MEP problem, especially when all I could think of plumbing was that our plumber only ever came to our house to fix what was under the sink or behind the toilet. But the research I had been doing in the years prior, especially in spatial AI and looking forward to functional AI, made me gravitate strongly toward applying AI to the physical domain. At the same time, I was also increasingly aware that in terms of data scale and model capabilities, the challenges at the scene (indoor room to building) level are significantly greater than those at the object level. Architecture and construction clearly offers the tallest challenge of all. Researchers like myself always welcome a challenge.
Being "generative" is the process, while being "functional" is the goal. Why would you want to produce a 3D design? Unlike image or video generation, it is not only for one's viewing pleasure! A 3D design, such as one of a building, is meant to be used or physically constructed, which again goes into its usage in the real world. We want to automate the design of a fully engineered 3D building that is ready for construction.
The gap is not specific to AI models. In research, a good outcome typically ends up in a high-quality publication, where success is measured by whether it outperforms state-of-the-art baselines, not by whether it delivers a product to real customers. What I learned from my time at Amazon was that an algorithm achieving 90% accuracy on a well-known benchmark is not the final answer. Every customer-facing result must be rigorously QA-ed, and the error tolerance is practically zero. That was when I developed not only an appreciation for, but a strong gravitation toward, human-centred workflows such as active learning. The stakes differ by domain, of course. In e-commerce, a misinformed customer is unhappy; in construction, errors are far less tolerable, as they can lead to serious safety consequences and significant financial loss. This is why I believe AI models in construction should not seek full automation. They must work alongside designers in an iterative workflow, with fast surrogate models providing meaningful feedback, while continuously learning from every correction and decision made along the way.
MEP for sure. To my knowledge, most research works in the fields of AI and visual computing have been on architectural (e.g., facade modeling) and structural (e.g., floorplans) elements of buildings. What turns out to be the hardest problem in practice, in terms of coordination and time consumption, MEP, is almost never touched.
AI is here to stay, and by 2026, it is already becoming a way of life. While AI models may never fully automate your construction projects to meet all of your requirements, they will become indispensable tools to reduce project completion time, cost, and waste, while improving sustainability and design quality. In a competitive market, clients will always gravitate toward contractors who can demonstrate these advantages. The choice is not whether AI will reshape construction, but whether you will be among those who shape how it does. I believe that every company will adopt AI — those who do not embrace it now will find themselves falling behind, not gradually, but decisively.
Non-residential buildings are larger, more complex, and more varied in functions than their residential counterparts. They do not follow cookie-cutter design patterns, so the cost of designing each one cannot amortize over high volumes the way a repeated residential floor plan can. Every hospital, data center, or school is essentially bespoke, with different site conditions, programmatic requirements, regulatory constraints, and MEP configurations. This is precisely where the economic case for automation is strongest, since there is no other lever to pull on design cost.
At the high level, you can think of such a foundation model as a ChatGPT or Claude to answer questions and complete tasks specific for construction. It will be a bespoke, or specialist, model, not a generic one. The most critical difference though, for the foundation model that I want to build, is that it goes far beyond question answering. My model must be able to create 3D designs, reason about them, and alter them on demand, for a contractor. All the construction knowledge, in textual form, must be well aligned to their spatial manifestations.
A stronger conviction that we are on the right path, starting with addressing the data scarcity and access problem through generative design. At the same time, we must work on spatial alignment with current LLMs.
My honest advice: ignore tools that merely wrap existing LLMs in a construction-themed interface. If a tool's core capability is something you could approximate yourself with a few API calls and some prompt engineering, it is not worth spending your money. The real challenge in construction is spatial and functional AI: the ability to reason in 3D, generate valid and constructible designs, and edit them intelligently. That capability cannot be conjured from a general-purpose language model. So when evaluating any new tool, ask one simple question: can it actually work in 3D space, or does it only talk about it? The answer will tell you everything.
Learn how your organization can use and benefit from the Augmenta Construction Platform
CONTACT US