Generative AI in the Real World: The Startup Opportunity with Gabriela de Queiroz

Ben Lorica and Gabriela de Queiroz, director of AI at Microsoft, talk about startups: specifically, AI startups. How do you get noticed? How do you generate real traction? What are startups doing with agents and with protocols like MCP and A2A? And which security issues should startups watch for, especially if they’re using open weights […]

Jun 4, 2025 - 01:40

Generative AI in the Real World: The Startup Opportunity with Gabriela de Queiroz

Ben Lorica and Gabriela de Queiroz, director of AI at Microsoft, talk about startups: specifically, AI startups. How do you get noticed? How do you generate real traction? What are startups doing with agents and with protocols like MCP and A2A? And which security issues should startups watch for, especially if they’re using open weights models?

Check out other episodes of this podcast on the O’Reilly learning platform.

About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

Points of Interest

0:00: Introduction to Gabriela de Queiroz, director of AI at Microsoft.
0:30: You work with a lot of startups and founders. How have the opportunities for startups in generative AI changed? Are the opportunities expanding?
0:56: Absolutely. The entry barrier for founders and developers is much lower. Startups are exploding—not just the amount but also the interesting things they are doing.
1:19: You catch startups when they’re still exploring, trying to build their MVP. So startups need to be more persistent in trying to find differentiation. If anyone can build an MVP, how do you distinguish yourself?
1:46: At Microsoft, I drive several strategic initiatives to help growth-stage startups. I also guide them in solving real pain points using our stacks. I’ve designed programs to spotlight founders.
3:08: I do a lot of engagement where I help startups go from the prototype or MVP to impact. An MVP is not enough. I need to see a real use case and I need to see some traction. When they have real customers, we see whether their MVP is working.
3:49: Are you starting to see patterns for gaining traction? Are they focusing on a specific domain? Or do they have a good dataset?
4:02: If they are solving a real use case in a specific domain or niche, this is where we see them succeed. They are solving a real pain, not building something generic.
4:27: We’re both in San Francisco, and solving a specific pain or finding a specific domain means something different. Techie founders can build something that’s used by their friends, but there’s no revenue.
5:03: This happens everywhere, but there’s a bigger culture around that here. I tell founders, “You need to show me traction.” We have several companies that started as open source, then they built a paid layer on top of the open source project.
5:34: You work with the folks at Azure, so presumably you know what actual enterprises are doing with generative AI. Can you give us an idea of what enterprises are starting to deploy? What is the level of comfort of enterprise with these technologies?
6:06: Enterprises are a little bit behind startups. Startups are building agents. Enterprises are not there yet. There’s a lot of heavy lifting on the data infrastructure that they need to have in place. And their use cases are complex. It’s similar to Big Data, where the enterprise took longer to optimize their stack.
7:19: Can you describe why enterprises need to modernize their data stack?
7:42: Reality isn’t magic. There’s a lot of complexity in data and how data is handled. There is a lot of data security and privacy that startups aren’t aware of but are important to enterprises. Even the kinds of data—the data isn’t well organized, there are different teams using different data sources.
8:28: Is RAG now a well-established pattern in the enterprise?
8:44: It is. RAG is part of everybody’s workflow.
8:51: The common use cases that seem to be further along are customer support, coding—what other buckets can you add?
9:07: Customer support and tickets are among the main pains and use cases. And they are very expensive. So it’s an easy win for enterprises when they move to GenAI or AI agents.
9:48: Are you saying that the tool builders are ahead of the tool buyers?
10:05: You’re right. I talk a lot with startups building agents. We discuss where the industry is heading and what the challenges are. If you think we are close to AGI, try to build an agent and you’ll see how far we are from AGI. When you want to scale, there’s another level of difficulty. When I ask for real examples and customers, the majority are not there yet.
11:01: Part of it is the terminology. People use the term “agent” even for a chatbot. There’s a lot of confusion. And startups are hyping the notion of multiagents. We will get there, but let’s start with single agents first. And you still need a human in the loop.
11:40: Yes, we talk about the human in the loop all the time. Even people who are bragging, when you ask them to show you, they’re not there yet.
12:00: On the agent front, if I asked you for a short presentation with three slides of examples that caught your attention, what would they be?
12:30: There’s a company doing an AI agent with emails and your calendar. Everyone uses email and calendars all day long. If we want to schedule dinner with a group of friends, but we have people with dietary restrictions, it would take forever to find a restaurant that checks all the boxes. There’s a company trying to make this automatic.
14:22: In recent months, developers have rallied around MCP and now A2A. Someone asked me for a list of vetted MCP servers. If the server comes from the company that developed the application, fine. But there are thousands of servers, and I’m wary. We already have software supply chain issues. Is MCP taking off, or is it a temporary fix?
15:48: It’s too early to say that this is it. There’s also the Google protocol (A2A); IBM created a protocol; this is an ongoing discussion, and because it’s evolving so fast, something will probably come in the next few months.
16:31: It’s very much like the internet and the standards that emerged from there. You can make it formal, or you can just build it, grow it, and somehow it becomes an empirical open standard.
17:15: We’re implicitly talking about text. Have you started to see near-production use cases involving multimodal models?
17:37: We’ve seen some use cases with multimodality, which is more complex.
17:48: Now you have to expand your data strategy to all these different data types.
18:07: Going back to the slides: If I had three slides, I’d try to get everyone on the same page about what an AI agent is. All the big companies have their own definitions. I’d set the stage with my definition: a system that can take action on your half. Then I’d say, if you think we’re close to AGI, try to build an agent. And the third slide would be to build one agent, rather than a multiagent. Start small, and then you can scale, not the other way around.
19:44: Orchestration of one agent is one thing. A lot of people throw around the term orchestration. For data engineering, orchestration means something specific, and a lot goes into it, even for a single agent. For multiagents, it’s a lot more complex. There’s orchestration and there’s communication too. An agent may withhold, ignore, or misunderstand information. So stick with one agent. Get that done and move forward.
20:33: The big thing in the foundational model space is reasoning. What has reasoning opened up for some of these startups? What applications rely on a reasoning-enhanced model? What model should I use, and can I get by with a model that doesn’t reason?
21:15: I haven’t seen any startup using reasoning yet. Probably because of what you are talking about. It’s expensive, it’s slower, and startups need to see wins fast.
21:46: They just ask for more free credits.
21:51: Free credits are not forever. But it’s not even the cost—it’s also the process and the waiting. What are the trade-offs? I haven’t seen startups talking with me about using reasoning.
22:22: The sound advice for anyone building anything is to be model agnostic. Design what you’re doing so you can use multiple models or switch models. We now have open weights models that are becoming more competitive. Last year we had Llama; now we also have Qwen and DeepSeek, with an incredible release cadence. Are you seeing more startups opting for open weights?
23:19: Definitely. But they need to be very careful when they use open models because of security. I see a lot of companies using DeepSeek. I ask them about security.
23:43: In the open weights world, you can have derivative models. Who vets the derivatives? Proprietary models have a lot more control. And there’s supply chain risks, though they’re not unique to the open weights models. We all depend on Python and Python libraries.
25:17: And with people forking derivative models. . . We’ve seen this with products as well; people building products and being profitable on top of open source projects. People built on a fork of a Python project or top of Python libraries and [became] profitable.
25:55: With the Chinese open weights models, I’ve talked to security people, and there’s nothing inherently insecure about using the weights. There might be architectural differences. But if you’re using one of the Chinese models in their open API, they might have to turn over data. Generally, access to the weights isn’t a common attack vector.
27:03: Or you can use companies like Microsoft. We have DeepSeek R1 available on Azure. But it’s gone through rigorous red-teaming and safety evaluation to mitigate risks.
27:39: There are differences in terms of alignment and red-teaming between Western and Chinese companies.
28:26: In closing, are there any parallels between what you’re seeing now and what we saw in data science?
28:40: It’s similar, but the scale and velocity are different. There are more resources and accessibility. The barrier to entry is lower.
29:06: The hype cycle is the same. You remember all the stories about “Data science is the sexy new job.” But the technology is now much more accessible, and there are a lot more stories and more excitement.
29:29: Back then, we only had a few options: Hadoop, Spark. . . Not like 100 different models. And they weren’t accessible to the general public.
30:03: Back then people didn’t need Hadoop or MapReduce or Spark if they didn’t have lots of data. And now, you don’t have to use the brightest or best-benchmarked LLM; you can use a small language model.