What lessons will we take from 2024's AI products into next year? How better to reflect on advancements in the field of AI over this past year than to ask AI itself? Today, Fraser and Nabeel use AI models like ChatGPT, Claude, and o1 to first look back and see if products and innovations of 2023 lived up to their hype in 2024. And to look at the show’s year in review, Nabeel puts these AI models to the test by feeding them episode transcripts and seeing which model produces a usable summary. Together, they also dive into the effectiveness of Google Gemini’s Deep Research tool, the rise of agentic computing in 2024, and whether products ever fundamentally change or just get constantly re-invented.
00:00 - Introduction
00:48 - Reflecting on a Year of AI Releases
00:23 - The Rise of Agentic AI
00:38 - Google Gemini’s Deep Research
00:47 - Using AI Models to Review 2024 at Hallway Chat
00:56 - The Role of Taste in Making Great AI Products
00:06 - The Rollout of o1
00:06 - WebSim - Self Expression Through Software
00:36 - Same Products, Different Iterations
00:39 - A Second Look at Apple Intelligence
00:10 - Living Agentically with AI
00:18 - Final Thoughts
[00:00:00]
Nabeel Hyatt: I would say my biggest takeaway from this year is like, this is the year where the AI workflow interface kind of came into focus. For me, it's this three panel interface where you have a Dropbox of context, where you can now both ChatGPT and Claude and others, you're like, you can pull in a Word doc, you can collect it to Google Drive, whatever.
It's like, this is where you get context for me to talk. And then. There's the chat window itself, which is still the user interface du jour of the way that I interact with this model. And then there's the template, the tablet, or the play space, the Claude artifact area, where the model is now rendering something for you.
Fraser Kelton: The thing that I love, it might be one of my most beloved AI features of 2024, is the summaries that occur in mail and in messages. It is so good. I don't know.
Nabeel Hyatt: I don't know, man. Like I got, it's just [00:01:00] bad. I disagree. I respectfully disagree. I do use the messages summary feature. I find that it's not great.
Fraser Kelton: It fits the amount of characters perfectly, and it gives you the gist of what's been discussed in these emails. It's delightful. And you're looking at me like I'm a maniac.
Hey everybody. I'm Nabeel. Welcome to Hallway Chat. Welcome to Hallway Chat, it's Fraser, welcome back. If there's any
Nabeel Hyatt: This is take two, because we forgot to hit record, and got five minutes into a wonderful discussion that we will try to impromptu play back now, because as you know, these are not the most scripted things on the planet.
Fraser Kelton: It is year end. We wanted to kind of reflect on what has been a crazy month of releases in AI from a bunch of different parties and try to figure out how to make sense of it is maybe the summary of what we wanted to discuss [00:02:00] today. Is that fair? I
Nabeel Hyatt: pop it up a level. You know, when we had a conversation about this, the December releases are interesting.
There's been an unbelievable number of things coming in with the last couple of weeks. I think I'm more interested less in hot takes for the week because neither of us are trying to turn this into some situation where we hop on a hop on a zoom and have a conversation on headlines, right? Here's the news.
Here's how we feel. I think it's a good, it's a good opportunity to reflect back on. 2024, and maybe in the spirit of collective learning, we can try and, you know, like, just what are the takeaways in how AI products evolved this year that maybe we could, you know, write on some sticky notes and put on the wall next to us as we go into thinking about what we're working on next year?
Sounds good. I got started by looking back at 23. So I spoke to ChatGPT, Claude, Gemini, you know, all the models. I feel like I have five therapists around me at all times giving me conflicting information. And I asked [00:03:00] them what happened in 23 in AI as a way of kind of thinking about the trajectory we've been on.
I often forget how quickly this is all moving. And obviously in 2023 the big one was Sam Altman was, was fired briefly from OpenAI in December of 23. But other than that, 23 felt like, when you look back on it, the year that all of the big companies released something. So that, that was Adobe Firefly, Canva Create, Spotify DJ, Bard.
Snapchat, AI, like literally like big company enter the chat, just all came in with some product, which as we talked about a couple of weeks ago, if you reflect back on those launches now, I think everyone was quaking in their boots that the incumbents are going to win. And if you fast forward a year, like, I would imagine that most of those teams internally, and we've heard some of this are kind of like disappointed with the penetration of those products.
It didn't really work. And then the kind of second thing I noticed [00:04:00] when I looked through a bunch of that stuff was. There were a couple of things that kind of, quote unquote, hit the market that were really not yet products yet. And we got this little glimmer of the future, but it was going to take time.
And that was the summer of agentic stuff is a good example. And we saw some very early voice stuff. And then if you fast forward a year, I think that's when you get, you know, now agentic reasoning with, a little bit of what 01 is doing. It's a little bit of what companies like Cognition and Encoding, you're seeing a lot of like, we're really seeing agentic work actually execute and do very interesting things.
Not across every product category and not across every vertical, but it feels like it's finally been productized. And that, that took it, let's remind us, like that took a year, right?
Fraser Kelton: Year, year and a half. Yeah. Yep. Pretty wild, which isn't to be surprised, right? Like the, that summer, the models were a number of generations earlier than where they are today.
All of the tooling to stand these things up [00:05:00] hadn't really been built yet for, for what you're supposed to do. And we hadn't, gone and plumbed the surface to figure out what use cases work well and which ones don't. And we've made, you know, tremendous progress on all three of those vectors.
Nabeel Hyatt: So, with that in mind, like, what are the kinds of things that you think launched in 24 that were a glimmer of the things that you think might be great products next year, but maybe weren't quite there yet.
Fraser Kelton: I played around at length with deep research. I don't know how to describe it. And if you try to describe how it fits into their product, well, I'll just get lost. It is from Google. I think it's from the Gemini team. I think it's just a brand new model that has been trained to do extensive web research.
And there's a UI within Gemini. google. com. I think that allows you to use it.
Nabeel Hyatt: I think because it's freaking hard to
Fraser Kelton: [00:06:00] find. It is.
Nabeel Hyatt: I think I had to ask you for help to find it the other day, and I'd already found it once beforehand. Yeah.
Fraser Kelton: Actually, you got to go to Gemini. google. And then I think you have to drop down from what looks like the Gemini logo to switch it to the Gemini Deep Research 1.
5 model or something like that. It is undoubtedly a glimpse of the future. It is not necessarily a product today. But my guess is in six to 12 months, there's going to be a lot of different product experiences that are providing this type of value. And so what is it? The name is literal. It is deep research.
You do a search, it goes and it combs the web. And in some cases it was finding like 86 different sources. And then it synthesizes those sources based on the question that you asked. And then it generates. I don't know, like a long report. So long that one of the top level features is to open it up in Google Docs so that you have like your traditional reading and editing experience [00:07:00] for it.
Awesome from a research perspective, right? Like the idea that it breaks down the line of inquiry that it has to go to the web to search for, and then it finds all of these sources. And along the way, actually, the product experience is actually pretty nice. Like it says, here's my plan. Do you agree? Do you want to edit it?
Do you want to like muck around with it? The,
Nabeel Hyatt: the. The plan portion is a perfect example of product leap from 23 to 24. No product manager would have launched that in 23, but we're in this kind of like agentic show your work 24. And so of course it shows you the plan and then gives you a chance to edit.
That's a very good point.
Fraser Kelton: That's right. And then, you know, for another kind of theme from the past little bit is it's a UI that handles latency measured in minutes or tens of minutes rather than, you know, hundreds of milliseconds because it's actually going out and doing work on your behalf. And then it comes back and it gives you the report.
It feels like I'm looking at really compelling research wrapped [00:08:00] UI around it. And it feels like we're mostly looking at interesting research coming from the Gemini team and it will get refined at the model layer. And in a year, year and a half, a lot of people will be having these. like really thorough, effortful research and synthesis, you know, flows within products.
But it doesn't feel like it's there today. You know what it feels like to me? Recently, when you were talking about Notebook LM and then the podcast feature, you were saying how it works because they've made a lot of great product decisions within the model itself and how it generates the audio. Like there's two hosts that have some interplay and there's, you know, the mannerisms.
It's missing that. It feels, Cold and technical, it feels like, I don't know, it feels like an LLM has written a very basic technical report on the question that I asked.
Nabeel Hyatt: In general, I think that's my biggest problem with Gemini overall. One is, I think the Gemini models, which just came out and updated, are Undoubtedly [00:09:00] really well performing.
Like I just a huge credit to Google at kind of, you know, there's, there's several headlines about Google roaring back and stuff like that, that are going on right now in this space. And I think all of that is fair, especially for API work. They still don't have the tone right. The tone has a, like, 2022, 2023 feeling, this kind of, like, antiseptic, at this point, almost a grating kind of tone, um, that is unfortunate.
Claude's obviously, I think, the best at that thing, but even ChatGPT over the last year has gotten, you can feel it, it's gotten better. and better at it. Um, it's not all the way there, but it's there's clearly somebody in there that's put a little bit of time and effort and trying to get that right. And Gemini seems very far behind when it comes to that.
And so that's the first bit is just it's writing style. The second thing, though, is I think you're right. I don't know that the output of this deep research is something that's supposed to look exactly like a generic web word document research report. There's something about the [00:10:00] format and interplay between you and the document that just needs some product iteration.
And that's right. I don't know what it is. I'm thinking back to some of our conversations over this year. Folks like a little bit of like the meter situation, like, is there more of a malleable interface version of this idea where I'm playing with artifacts? I'm playing with it as a dashboard almost of these ideas, um, or is it a situation where it's doing all this really deep research, but then it's almost, or is it like augmenting me while I kind of write or flow or speak the way that say, I don't know, the guys at Granola would have designed this interface more as like a silent co pilot in the background that's like making you smarter and filling things in versus trying to yell over you and write the paper for you.
Like, it feels like the research task is really great, but, but yeah, the product instantiation of it is probably another cycle away from really hitting something that is mass market.
Fraser Kelton: Yeah, it's interesting, right? Because there's two product [00:11:00] problems that have now been smudged into one, right? Like one is the agent is going across the web to do the laborious research and synthesis for you, right?
And I'm pretty particular, like, I want to have a lot of control over what I write. And you don't get the deep research and synthesis right now without the report written in like the style of however they've set it up at the same time, right? And so you can imagine, as you said, like there, there is likely going to be a lot of good product work and discovery to be done.
Maybe the output of that deep research is literally just the research rather than then trying to put it into the digestible report, which is the final step.
Nabeel Hyatt: You know, that's right. By the way, I asked I had a funny, you don't, you don't know this, but I fed in the transcripts of all of our hallway chats for the year.
Okay. And I also had all these models try and tell us what our own review of the year was. Okay. Which I'm going to get to in a second. But I, I would say [00:12:00] my biggest takeaway from this year, if I'm trying to think about it is like, this is the year where the AI workflow interfaced. kind of came into focus.
For me, it's this three panel interface where you have a dropbox of context where you can now, both ChatGPT and Claude and others, you're like, you can pull in a Word doc, you can collect it to Google Drive, whatever. It's like, this is where you get context for me to talk, and instead of just me being in a raw chat.
And then there's the chat window itself, which is still the user interface du jour of the way that I interact with this model. And then there's, The template, the tablet, or the play space, the Claude artifact area, where the model is now rendering something for you, and this kind of three panel like Dropbox, chat, and then artifact, I now see that pattern, it's in ChatGPT, it's been now fully adopted, but it's also in stuff like Cursor, and Windsurf, and if you look at NotebookLM, like I think it's the trend, and maybe That's the way [00:13:00] of surmising the deep research product.
The deep research product is doing something at the model layer. It's doing work. Great. And now at the very least, you should surface it up into an interface where that's just providing stuff for the context window. That's just the left panel. It's just throwing a bunch of interesting stuff into the context window.
And then by the way, let me drag any documents I want over there. And then let me chat in the center and then on the right hand side, now we can build a paper together. Now we can build this thing together. I would at least start there. Now, this is just, you know, two VCs opining on the product. All of the nuance and wonder and joy of product comes from the doing.
So that would be the, I'm sure you'd do that and you'd get it up there and you'd find 15 things wrong with it. Then you'd have to iterate. But at least if I'm building a product in 2020 5 January 1st, I would at least start with that as my palette and then go.
Fraser Kelton: Yeah, well, yeah, I mean, it is always good to dunk on ourselves, but [00:14:00] there's another way to take about it, which is like more kind.
And that is, I think you and I can have a user centric point of view on these things. And if you think about the technology that they delivered, that's quite a lot. Cool, but the value that they're delivering to the end user is a report, right? And so what they're basically saying is, we have a middling written report for you because that's the right work product for this experience.
And I would be shocked if the work product that actually people care about is the automation of the actual deep research rather than then the synthesis and the writing of a really oddly framed report on it.
Nabeel Hyatt: Yeah, I agree. So, review of 24, from ourselves to ourselves, I took all of the transcripts, well in the case of some of these like ChatGPT, when it has search, I just said, look at all the podcasts for hallway chat for 2024.
And write me up a high level [00:15:00] review. I want you to focus on the most creative and insightful topics. I do not want a generic summary. I want an insight filled set of headlines. Even subjects that were mentioned only once can make it into the summary if they are valuable enough to founders and product builders of AI startups.
So that was my, that was my prompt. I use the same prompt everywhere. I gave this to ChatGPT and then ChatGPT, the 01 model as well. And then to Claude. And I have to tell you, like, It's an interesting situation that I've struggled with. I'll read them out to you, but my feeling was that Claude beat all of them.
And it leaves me with a period where I would think going back through a year's worth of transcripts that something like O1 would just do deeper research to go to the previous topic. And in real time while we're talking, by the way, I decided to drop this into deep research and get feedback from deep research as well.
So I'm going to come back to that in just a second. So ChatGPT basically gave me a list of [00:16:00] 10 or 12 things. And then I said, that's too many, like basically summarized to the top three.
Fraser Kelton: Yeah.
Nabeel Hyatt: ChatGPT's answer was. user centric design over model obsession. So, you know, focus, right. And then second, avoid the AI hype trap.
And then third was tailor AI solutions to diverse user needs. It's terrible. It's terrible. It's
Fraser Kelton: terrible. I don't, I'm not
Nabeel Hyatt: listening to this. You didn't even internalize any of those words and neither did anybody listen. So GPT 01, now this is at least processable. Again, it gave me a list of 12, 14. And again, I said.
Listen, like, sum it up dudes, like, and it said specialization beats size. So startups are discovering that smaller domain specific AI models deliver more value than massive general purpose ones. Second, data is your differentiator. Curation and careful selection of training data has become as essential as product building, blah, blah, [00:17:00] blah.
Yep. Third, alter over code. Engineering a team that's curious, adaptable, and strategic about AI technology is more important than any single tool or algorithm. I don't know when we ever talked about that third one, by the way, but it seems like a half decent, half decent point. I like it. Yeah, but all those seem fine.
They honestly don't feel like the things we actually focused on. They just feel like, it feels like Owan did a lot of research and came back with some generic advice. Here's Claude. Now, Claude, I had to do a little bit more work. Actually, I had to copy and paste a bunch of. Transcripts into the project in some cases because it didn't have web search, but here's what it said here are the three major strategic insights for founders and builders.
One, the OKR ification trap. The obsession with short term metrics and quarterly OKRs is actively preventing innovation in AI products. After a decade of optimizing for efficiency gains, the startup ecosystem has developed a muscle memory for quick iterations and measurable growth. The company's making breakthrough progress, so they're willing to resist this pressure for [00:18:00] immediate metrics and take a long term view on some things that aren't measurable.
Which literally we talked about, like, four minutes ago in this podcast. Second, software should be soft. I like that. Pulled out a quote from somebody that we spoke with. A fundamental shift is happening from shipping fixed software products to generating malleable software on demand. Keeps going. I'll move to the rest.
Go deep to stay ahead is the third one. While many startups have tried to move quickly to build on top of the AI API, the most defensible positions seem to come from going deeper in the stack. This means not just prompt engineering, but looking at key parts of your data exhaust and what unique insights they may develop for your users over time.
Like, what do you think when I say all that back? Does this sound like insights at all? Are they usable?
Fraser Kelton: The cloud one more so than the others. But yeah, sure. Like I think, I think that that gets the, yeah, rolling conversations that are unfocused over a 12 month period. It's hard to like [00:19:00] cut through and, and summarize, but that's a pretty good job.
Nabeel Hyatt: Okay. So I also have real time feedback from Gemini. This is deep research Gemini and it might be better. Okay. So what it actually did was pull out Headlines, like it actually took quotes from our podcast and turn them into headlines, which I guess is okay, but it's also basically just the headlines of podcasts that we've done, um, but, but, but it's forget incumbents.
It's startups versus the LLMs. Okay. Right? Software should be soft. Okay. Right? Design an AI for user agency instead of your interface. How AI is empowering hobbyists and creatives, which we've talked about. did a whole episode on, but also touched on otherwise. What does it feel like to build for a hobbyist versus the creator economy?
And then this one's just a headline, making [00:20:00] product on the S curve of AI, adapting to disruption, which is just like literally a headline of one of the shows. But,
Fraser Kelton: um, not bad, not bad. I think that's been a, that's been a reasonable theme for throughout the year though. It's fair. Is where are we and what type of features should you be building?
Absolutely. You know, I, I think you glossed over something because we had that false start today. The OKRification and careful what you measure and you can only drive improvements on what you measure, et cetera, et cetera. I don't think we've talked about Claude and the tonality. What is there to say?
Nabeel Hyatt: I think, you know, we talked about Gemini.
Yeah, it still sounds like a totally generic AI model. Speaking to me. 23, I think was when Claude really got its tone right. We're two years in, I don't think anyone's really matched it still. I think that's largely because there's no eval for it, right? There's, there's still not enough effective eval for it.
And so the, the word of 2024 is taste. The overused word. That's appropriate. An unfortunately overused [00:21:00] word, yep. But it is, but it's a cliche for a reason. It is the thing, right? And taste is a very hard thing to measure.
Fraser Kelton: Very hard thing to measure. If your pipelines and processes are around optimizing for evals, you know, good luck getting a great eval for this.
And so I think, why did I want to re raise this? I think. It's because these soft things are what make products great, right? And they might've gotten the Claude tone right in 2023, but it is clearly getting better and better and better. Like it is the vector of progress is in place and others are noticing it, right?
There's a whole New York Times article on what you and I know from day to day. People are going crazy for it in the Bay Area. I had that moment that I talked about where it was like helping me bake the roast and the tone was so. So good that I thought of it as a companion for one quick second. And I, I'm, I'm very much not in that [00:22:00] camp.
Nabeel Hyatt: Look, I think it's always true. We can always, almost at any time in the last 20 years, 30 years, 40 years, talk about the value of design and product. And I could bring up, we just play quotes from Steve Jobs from like 1985 for the next 20 minutes, it would be fine. I'd love that. That's always true. I wouldn't mind it at all.
I would say. The thing that's true even more so is in a world where we have now been data centered for the last 15 years and we're getting better and better at measuring more things, then you have to assume that all of your competition is also measuring those things and, you know, it is true that what you measure is what you get better at, But in a world where more things are measured, the leverage therefore increases in the things that are unmeasurable.
And so your challenge, I think, as a leader is like, Oh, it's so easy to put up an OKR or an eval. It's so easy to drive towards the goal. And in fact, that's the cleanest way I can help motivate my team and understanding we've hit it. [00:23:00] So we don't just like self satisfyingly pat us on the back and say, like, good job, but we actually know we've really delivered, but at the very same time, the most unmeasurable parts of your product are probably its most defensible parts, right?
And that is the little bit of the conundrum that will actually only grow in leverage over the next couple of years. Because, of course, everything you can eval is not just your team evaling it with an OKR now, but it's the models you're building evaling against it as well. So your model is going to be self improving, and these other models are going to be self improving.
And so no matter what, it's a rat race, right? And so therefore, whatever parts of your product philosophy, whatever parts of your customer experience are the parts that seem of value but are very hard to quantify, those things as the overall Competitive edge will grow in relative value over the next couple of years.
That's probably like a, frankly, a five, 10 year trend for sure. Um, if you can get it right, but I, but I recognize like the, how do I then motivate my team to the unmeasurable? That's a very interesting A [00:24:00] longer discussion than we have time for now, but it's where we should all be trying to search for.
Fraser Kelton: Yep.
So that was a long rambling road that you and I took from your question of what have we seen now that is likely to become compelling products sometime next year? And I would put deep research in that bucket. Like I think great research to get it there. I think if you think of it through the lens of a team, building like a demo platform to show that research and what it can do.
They did good work to get it out, you know, on probably a very tight timeline, but it's not the product that it's ultimately going to materialize as, and we're going to see that sometime next year and it's going to be awesome. Like, there's no doubt that research and synthesis that goes super deep is going to be so valuable for so many different use cases and so many different users.
Nabeel Hyatt: I've got two others. So if your first one is that one, I've got two others. I have a big one and a small one. My big one is Desktop Realtime Streaming, which Google launched Desktop Realtime Streaming. There's also some startups like [00:25:00] Highlight that have also launched Desktop Realtime Streaming, and This idea that the context window for which you were chatting with an AI is just the things I'm doing on my computer right now is obviously and clearly initially magical, but I would argue is still in a way pre product market fit, or we are still figuring out what the proper affordances are for that relationship in a way that, again, it's like agentic stuff from a year and a half ago summer where like you get little glimmers of what it feels like and you're like, Oh, this is definitely the future, but then you.
Kind of like reflect back on yourself a week and a half later and you're like, I don't know exactly how it fits into my life and I don't know which workflows and I don't know where to talk about what and all the rest of those things that have yet to really kind of be negotiated and worked out, right?
So that's my one I'd be watching. Like, I think it's a real thing and it will probably take a little bit of duration before we get there. Do you agree or have you found patterns for Google desktop research?
Fraser Kelton: No, I, I, I totally agree. It is so intoxicatingly fun though to think about what types of [00:26:00] products are going to be launched on top of this in the coming future.
The issue is, you raised this to me last time we were chatting about it is, I don't want what feels like accessibility options to narrate what is on my screen in front of me. right? Like you had the same experience as me. I see a dashboard and beside that is a webcam video showing somebody and da da da da da da.
Yeah, it's like,
Nabeel Hyatt: thanks man. I'm not blind. I'm looking at the window. I understand it's a window. I don't need to describe all the windows on my screen.
Fraser Kelton: But it will be amazing because in the short term, you'll ask it how to do something and it will tell you and then you'll just go and do it. And then in time, there will be products that then just control your mouse and go and do those things with that technology.
But we're just getting a glimpse of the capability that will get better and better and better and then refined into the product that is able to actually deliver those things. But it doesn't feel like it's there today. Yeah, I agree with that, but that
Nabeel Hyatt: makes me excited for the future. We get to see all the
Fraser Kelton: experimentation.
Oh, sure. I've told you this [00:27:00] before. I think the one that I'm most excited for the future are these reasoning models first seen in 01. Yeah. So part of the problem is the audience of people who are paying attention to demo research. is multiple orders of magnitude more than it was even a handful of years ago, right?
And you and I played around with GPT 2 when it was released and we would have, you know, kind of squinted at it and tried to make sense of it. But the total audience of people who was playing with that was probably in the thousands, right? But now you land something that is really like remarkable research.
Uh, in 0. 1 and people are like, Oh, it, it doesn't write my essays as well as, as, as 4. 0 or, or Claude, right? Like it's not good. And it's because the audience now for these demos and these research releases is basically like, you know, hundreds of millions of people. I messaged a whole bunch of friends there to congratulate them when I got to [00:28:00] experience it for the first time, because it really does feel to me like GPT 2, where it's not GPT in the sense that they've got it good enough that you can now experience what's about to happen.
Yeah. And we're going to see in two to three or four years, I think the same level of change that we've just lived through from GPT 2 to now, all over again, because of that type of model and that architecture.
Nabeel Hyatt: Yeah, the phrase I had when I was trying to, somebody asked me about O1 last week at a dinner and my phrasing was like, oh yeah, it feels like a model, it's just missing its ChatGPT moment.
Fraser Kelton: Yeah, yeah. Yeah. But the ChatGPT from GPT2 was like three or four years and a whole bunch of different, you know, generations. And I very much, that's a great way to put it is we are going to see it get orders of magnitude better. And we're also going to learn how to shape it into a product experience that works.
It already is profound. It's so, so remarkable. I think that's one of the most [00:29:00] amazing technical releases of the year for me. For sure. And I think we're a long way away from seeing product.
Nabeel Hyatt: I still don't know when to turn to it, like when to use it and when not to use it. There are times where the tonality, for instance, of Claude is just a better product for what I need.
Even though it was kind of a researchy question, it just still answers better. But the thing I like most about O1 is it seems more assertive and more disagreeable. in a perfectly wonderful way. Like most of these models are so ridiculously sycophantic at this point that they are just here to make you happy.
And yes, I will go do that master. And like they do whatever. And in fact, I've gotten to putting in, please disagree with me inside of the Claude like instructions inside of projects to try and get it to disagree with me. And O1 just seems more fundamentally like willing possibly because of its context window, it's researchers doing to just come back and say, Hey, I think your argument here is flawed or you're doing something wrong here.
So, and so forth. I fed it a transcript of a conversation I was having with some partners [00:30:00] on a weird side project where I'm opening a board game library in Berkeley. And just for fun, I took like a two to three hour conversation, took the entire transcript, dropped it in, and basically said, What do you think about the topics discussed?
What do we miss? And what do you disagree with most? And, and provide reasoning against that. And it did a very good job in a way that ChatGPT was horrible at, and Claude was good at, but I thought O1 was really interesting in the way that it kind of dissected out the logical fallacies that we had in the things that we were talking about, and then kind of pushed back with a little bit of research against it, a little bit of thought process against the different topics we're having.
Really! helpful for obviously a very low stakes conversation for our little weird side project, but was great. It was great.
Fraser Kelton: There's so much to like about that little anecdote. Even the idea that you have with friends, a side project to start a board game store in Berkeley. Library
Nabeel Hyatt: that I'm also then randomly recording all of and then also feeding into the
Fraser Kelton: files.
Yeah, yeah, yeah. Everything that you just said is What [00:31:00] people said when GPT 3 was first introduced, right? Like, oh, I don't really know when to use it. I don't know how to use it yet. And it feels like they are meandering that path with us. They being the model builders, right? That's what's so wonderful about this moment is there's a lot of people.
Pushing and pulling on these things and experimenting in real time to see what they can do and what's novel and what's interesting and what's not working. And then it just, uh, it gets fed back to the people building the models and they improve it in those directions. And so a couple generations from now, we're going to collectively, like, heave it forward.
I think you're
Nabeel Hyatt: absolutely right. And it makes me feel A little bit more optimistic about the whole ecosystem because I do worry about everything becoming closed off and you're right, these things are pretty raw, which means we're all collectively in the playground together trying to figure it all out, which is exactly what you want to
Fraser Kelton: do, what you want to be.
Yeah, for sure. Here's one looking back. It's not from December, but we haven't really discussed it at length. You and [00:32:00] I spent a good amount of time playing around with WebSim. I think we mentioned it way back when, but then they went from just hallucinating websites to basically like hallucinating web apps.
And I was reflecting on this year. I think that's the most fascinating product that I've used this year. And I pick fascinating Carefully, because it's not entirely useful and it's not entirely valuable, but oftentimes like those things emerge from things that are fascinating and you know, we like Sean and Rob so much, they have framed it as self expression through software, and this is.
You know, I think the, the things that like are obvious when writing code becomes automatic and like democratized or free, you're going to have software apps that become disposable. You're going to write your SaaS on demand. You're going to have all these other things. And this was an experience where you, you looked at it, you played around with it.
You were part of the community and. [00:33:00] Software, self expression through software with WebSim feels like we are going to discover awesome things in the future.
Nabeel Hyatt: I agree. It reminds me of Tumblr. It reminds me of MySpace. It reminds me of the first BBS I ever built way back when. Like, it's just kind of like this manifestation of self in a way that is necessarily more playful and more expressive than it is some kind of like utilitarian function, customer development.
V to V vertical task process thing, you know, that reminded me of something that we talked about a while ago, but I don't think we ever brought up here together, which is thing that's been teasing away in my brain about how none of these products ever really changes. They get re just reinvented for the new technical framework.
Like that's totally what that brought up for me, then, you know, discord is basically the same as AOL IM, which is basically the same as IRC. You know, this view that that kind of real time messaging and communication layer is probably [00:34:00] a thing that still exists either with Discord or some other company in 20 40 years, and it's going to be a mutable need.
And then if you take that lens that there are just almost self expression is one, there are almost a canonical behaviors that people want to do, you can use that lens and then look through the major categories and try and find The areas that, you know, maybe our founders in 2025 can spend more time plumbing away at, because even though we're a couple of years into AI, there's probably some of these canonical categories that haven't really been fully explored.
What's funny is I totally forgot about doing this, but there's another situation where you and me had talked about it. The partnership had talked about it as well. We covered briefly in an offsite. I talked to some models about it as well, so I have some Gemini and some Claude and so on conversations that I just had in the car, mostly, while driving around.
That is my new default, by the way. You should short podcasts as a medium. Because when I get into the car now, it's just [00:35:00] voice. Like, I just turn on ChatGPT voice mode. And I dump out some of the things I have that came from the last meeting and I'm just trying to think out loud about, I know I'll have a transcript about it, I'll also get some feedback on it, I have a sparring partner for it, in a way that's like, I normally would have like primed up a podcast to be filling that time when I'm commuting, but I don't know what that means for the thing we're doing right here, but that's, that's fine.
Okay, so here's what I wrote. I'm trying to consider the major categories that have endured in consumer and creative software. Considering the set of primitives that the internet and consumers just need. For example, IRC becomes AOL, IAM becomes Discord, eBay and Amazon Marketplace and Etsy, Print Shop becomes Adobe, becomes Canva, just kind of like, and I gave some other ideas.
And then that said, please consider other canonical products from the late 1990s. Whether they were large companies doesn't matter. Just that they felt popular and essential. [00:36:00] And then think about modern companies from the 2010s that match the same usage pattern. There does not need to be a contemporary post 2020 company.
Blah, blah, blah. Think of 5 to start and then we'll hack away from there. It gives me back some I go back and forth a little bit because of course it's like off slightly. I didn't get it quite right in vector space. And so kind of like oriented around I'm going to read off a handful of these. I want you to think about whether you think any of these, especially in light of the conversation of what we just talked about, which was 2024.
We have some experiments in real time. We have some experiments in deep research, and we have this evolution or manifestation of this, like, almost like three panel view, artifacts, chat, and then the context window. So with that in mind, here are the canonical things that it came up with. It's information retrieval.
This is an obvious one. AskGs becomes Quora, becomes [00:37:00] something. So the kind of Q and A format, that kind of stuff. Now, I don't like this list.
I'm looking at this list now. I have a lot of lists that these models gave me, and I don't love the lists, to be honest, they're like, okay, but they
Fraser Kelton: need some human love. Uh, they're still not great. That sums up the past couple of years.
Nabeel Hyatt: That's right. Okay, I'll read off some of them that seem kind of interesting, but I'm going to skip some ones that seem dumb.
So, yeah, I think there's a question of what the future of Q& A format Quora is and whether that gets completely solved inside of ChatGPT or whether there's something new to be invented there. Writing in document tools, the kind of like WordPerfect becomes Google Docs becomes Notion. What's the next version of that?
Um, it might well look like Notion. I don't know. There's a, there's a world where that's something that's quite different. Data analysis, core canonical thing you need to do, you'd started doing it with [00:38:00] Excel or Google Sheets, SQL, what is the next version of that, we've seen lots of startups experimenting with that, feels like we haven't quite gotten to what it's supposed to be.
What I think about a lot that is on this list too is online reviews and recommendations, you know, ePinions to Yelp. What does it feel like to get judgment on something in the future? The way I would rethink that is, you know, if Web 2. 0 was wisdom of the crowds, then I think of this AI age as the wisdom of experts.
Hmm, it's a different thing. If I'm asking about why this rash on my face, what that is, I don't need the wisdom of the crowd's version. I don't want the Google version of what the rash on my face means, which is, as we all know, is just going to mean cancer because everything you look up on Google is going to be cancer.
But what I want is the wisdom of experts. I want, you know, 40 doctors who are incredibly smart, who have inputted into the model thoughts about what this could be. And we're seeing this, obviously, The model companies who are spending ridiculous amounts of money right now hiring a bunch of PhDs to do a bunch of data model entry.
And so I wonder if there's a [00:39:00] similar version here. where, you know, if this era is the wisdom of experts, then is there a version of that that has to do with this particular product category? Otherwise, I don't know where we get new, if you're not paying people to figure out what the good coffee maker is, I don't know where we get that new information on the internet.
versus Internet Slop versus all the stuff that ChatGPT is just going to spit out with no net new insights, right? No ground truth data of somebody who walked into that restaurant and thought the ambiance was good or bad. Very
Fraser Kelton: interesting. Certainly that's going to continue though. Like there will be a new Reddit.
Like I use Reddit today for for discovering what to buy, but there will be an entirely new, undoubtedly, right? Like there's, Tell me,
Nabeel Hyatt: like you used to contribute to Yelp and you would get a little bit of fame and fortune because you'd be SEO to find it later and then Yelp would get some advertising revenue [00:40:00] where they get traffic off of that, which they can then monetize back to advertising revenue to stores.
who are trying to get awareness. But if I'm just speaking to the model and the model, first of all, removes me as an active actor, even though I reviewed that restaurant, it doesn't have to say my name. So I don't fame and fortune. I don't feel like a happy Yelp elite, right? So then that disincentivizes me from contributing to the model because I'm not getting recognition.
Then there's no advertising either because the model is just responding. The Wisdom of the Crowds model of Web 2. 0 just gets broken in that. And I don't know what the new model will be for net new knowledge generation on the internet other than Models hire PhDs by the hour to be smart into their models, which is the answer in 2024.
But that feels brute force and ugly and not scalable or sustainable over time. That's not an ecosystem that's healthy. That feels wrong. That feels like you're, you're starting a university by paying a [00:41:00] professor to just write as many papers as possible per month and paying them by paper output. That's not an ecosystem.
Fraser Kelton: We like stumbled into what I think is an awesome discussion, but I don't think we should do it on this. on today. Like this is, no, no, this is a really meaty topic. It took me a second to realize even what you were talking about, but it breaks Web 2. 0's like core proposition. Maybe there's a new product experience that leverages AI, but it's not just like you to the model.
And maybe there is still some sort of dynamic where the collective is present or some small community is present. But for
Nabeel Hyatt: me, the summation is I wanted to know the general wisdom of crowds in Web 2. 0. I'm going to crowdsource thousands and thousands of opinions and get the kind of net average of that opinion.
That is not the way I'm going to want to interact with an LLM model. I want an LLM model to know who's smart, where the experts are in that topic, weight those [00:42:00] far more than the person who is uneducated or doesn't know about that topic. The PhD in that topic, I want to understand and be able to derive their knowledge.
And that is what people do. That's implicitly what evals are doing, by the way, right? Is that you have a bunch of PhDs upvoting and downvoting the right and wrong answers, and therefore leaning it towards where the PhDs think are right. That's what's happening. That's how it gets better math answers over time, right?
And that's how it gets better philosophy answers over time. Only they're doing that in a way that feels brute force and unsustainable by just literally paying smart people by the hour to do it as much as possible. That does not feel like a way you have reconstructed the global economy for the future.
That feels like a short term Small brained fix where you're throwing money at the problem. There's a teasing away at me that feels like there's a more elegant way that we should be organizing the economy for the future. If this is really how we're all going to be smart and how we're all going to do work over the course of the next couple of decades.
And we can [00:43:00] stop there and we can come back to it, but I think that's the summation of what I think the issue with this. You know, we started it with a Yelp conversation, but it's obviously it's much deeper and broader than that. It's how these models get smart.
Fraser Kelton: You think it's small brain to like hire doctors to not just rank the results, but like create the reward data or like the fine tuning RLHF data?
Nabeel Hyatt: I don't know. The, the like purest one on one capitalism view of this is like, yes, you pay the people for the labor and then they make the thing and then you get leverage off of the labor and that's how the economy should work. I am still enamored with the very elegant way. Not that it didn't have its flaws, but the very elegant way that we allowed people to do the things they were really passionate about kind of intuitively and intrinsically in Web 2.
And that doesn't mean they don't get paid. I'm not saying we don't pay people. Like, people should find ways to be remunerated for their work. But that's probably the summation of what I'm trying to grasp at.
Fraser Kelton: Boy, there's a whole lot of different things here. Like, where's the data for training the large models?
Where is the outlet for self expression online? The business model of Yelp and Reddit and everything else has allowed these communities to thrive and flourish. But in a world where, like, the business model changes, how are the services going to change, or which new ones are going to emerge to take their place?
There's a lot of, like, really interesting stuff there.
Nabeel Hyatt: Okay, so I'm going to finish up on a couple other categories that are canonical categories, and then you'll see if any of these trigger another deep, whole, hour long conversation. I don't think so. I mean, the other ones that are on here are [00:45:00] Media consumption.
That's an obvious one. What's the future of music and movies and things like that? How do diffusion models and the rest of that play into that? That's a clear one. It's a canonical thing we do. Marketplaces. So the Ebays and Amazons and Etsys. And I think there's areas on the search side where you probably can maybe reinvent the interface.
And then we've seen some folks that are experimenting also on the creation side, right? What happens to Etsy when you can imagine everything that you could possibly make?
Fraser Kelton: And
Nabeel Hyatt: is there a way that that becomes a dialogue with the creator and the buyer? Are there ways you just change that relationship in a way that feels more collaborative?
I think there's lots of fun, interesting ways that you rebuild what an online marketplace feels like there. Buddy, anything else on here? There's always some version of Asana, Monday. com, like there's some version of keeping track of everything.
Fraser Kelton: There will be a new version and it will be rethought for a world of AI capabilities, for sure.[00:46:00]
For sure, yep, 100%. Like the old ones have, are ridiculously sticky, like JIRA is still around, right, unfortunately. But there will be, there will be a new modern experience rethought for a world of AI. Yeah, that's right.
Nabeel Hyatt: We could stop there. Those are some things to go over your Christmas breaks where you have a minute with your family and then you're bored because they have some Christmas show on that's just way too boring, but they like to watch it every single year and your mind is wandering.
You can let it wander into this area.
Fraser Kelton: I have an update. Um, recently we talked about how there was no product from before 2023 that has introduced AI features that are actually usable. And this like squarely puts me on like the left side of the curve, no doubt. I think that on the whole, the integration of Apple intelligence is really like Janky, and it's like hamfisted.
Come on. No, no, no, no. Wait, wait, wait, wait, wait, wait. And it's hamfisted. And like, I don't want [00:47:00] like the rewrite feature that like, I just want to copy paste this stuff. Like it don't, don't give me that. I am super proud of the gang who has like the new camera button that allows you to go right into the vision for ChatGPT.
But I demoed that like twice and it's cool. And I don't know how often I'll return to it. The thing that I love, it might be one of my most beloved AI features of 2024, is the summaries that occur in mail and in messages. It is so good. You love it. It is so good. I love it. I consume it, you know, a hundred times a day.
Nabeel Hyatt: Yes.
Fraser Kelton: And, and they're great. I don't know. I don't know, man. Like
Nabeel Hyatt: I disagree. I respectfully disagree. I do use the messages summary feature and I find that it's not great. Great. Maybe it's better in the mail context where it has more words to work with and more paragraphs to work with.
So there's something to surmise, [00:48:00] but when it's trying to summarize five texts, it gets a little off.
Fraser Kelton: I'm looking at it right now in mail. It is spot on and useful. They must have trained their own model, I'm guessing. And. Somebody who, like, really cares about the product experience.
It's delightful, and you're looking at me like I'm a maniac.
Nabeel Hyatt: The one thing I do want to use is my friend Dan Shipper over at Every. He shipped this thing called Quora. yesterday, and I haven't had a chance to install it, but it is a inbox agent, which basically summarizes all of your emails and send you a summary of your email every single day, twice a day, and also with some draft responses in your own voice, theoretically.
It's more likely to do that than I am to switch to Apple Mail, that's for sure.
Oh, last, last bit for me on 2024, like takeaway. I gotta tell you, [00:49:00] I had a conversation with a founder who shall remain nameless, but is running an AI startup this week, and somebody I talk to frequently and he was just talking about how he came to realize over the last couple of months, about half of his AI team, and maybe only 10 percent of his total team are actually AI red pilled at this moment.
And let me say what I mean. After the ChatGPT moment, There's a second moment that happens, I think, in people's brains where they're like, Oh, this isn't just like.
a smarter search tool that I can like debate with. This is like a junior coworker and partner. . Not is AI going to be helpful, but there's this moment that you have on this journey where for a lot of people, it's like watching Devin or some kind of agentic thing run around and actually like really do work. And I [00:50:00] had a conversation with a founder who was just like, yeah, I realized that like most of my product team and even most of my AI team wasn't yet at that level of belief in what was coming down the pike. And then I relayed that to a portfolio CEO yesterday.
And he said to me, like, I have this exact problem. Like most of my executive team. He uses ChatGPT regularly, but for kind of like, you know, Google search plus plus kind of needs and is not at all, he called it not at all living agentically and not really considering what the world feels like if you're living agentically where, you know, this world where you're almost like thinking like you're giving instructions to an intern, like, let me, let me write in paragraphs and really describe this problem to you so that you can really get to the root of the answer and try and feed it back to me.
Um, I don't know that I have a takeaway other than like, I was like, holy crap,
Fraser Kelton: like. There's a whole bunch of different thoughts into my head all at once. The [00:51:00] saying that the future's already here is just not evenly distributed. The leverage given by this technology is going to be so dramatic that the people who are, you know, early to adopt it are going to have tremendous productivity gains relative to everybody else.
That's like the obvious thought. The other thought that I keep coming back to that I think a lot of people don't appreciate is people are busy with busy lives and they have hobbies and they have stress. And for you and me, like we love this stuff and our lives would be this, even if it wasn't a big part of our job.
But for most people, it's kind of like, Hmm, let's do it. interesting. And then they go watch the, the lions or the bills, right? Like no shade, but it's just like different there. I am constantly surprised that even in technology, even with people who are at the forefront of technology, there's oftentimes a lack of plasticity in thought.
They might not have the imagination to [00:52:00] appreciate that. If it's doing this today, in a year from now, this is what it's going to be doing for all of us.
And I think that if you like combine those last two ones, especially like, if you're not playing around with it, and you also don't have a great, like, elastic imagination, it can be tough to see a month or six or eight or 12
Nabeel Hyatt: ahead. I think I agree with you on both of those points. And maybe that's the challenge back.
is just not to take for granted even the people that are closest to us are exactly operating with the same priors and therefore making the same decisions. Like, for people on our team, I'm not going to go this for our own team, like, I'm going to make sure Everybody's opened up WebSim, Repl. it Agents, and maybe Gemini Deep Research.
Like three agentic computing things, spend 15 minutes at each, build something at each, watch it work. It's [00:53:00] a different mental model than the way you interact with something like ChatGPT, and it will help you think differently about product. And I have to first try and get my wife there. But I then, like, I think it's a good challenge to try and make sure, frankly, all the CEOs.
that I'm working with, and frankly, all of the people that they work with have gone through a process like this, because maybe I underestimate how much of an edge it will actually get you. Even inside of this little bubble of AI we're all already in, there's still already a disparity of where we're sitting in our worldviews.
Fraser Kelton: Mm hmm.
Nabeel Hyatt: Let alone the rest of the world. Right. Let alone the rest of the world, for sure. So build your best web sim, build your best thing in Repl. it or a V0 from Vercel is also quite good. I've been playing around with that lately. Windsurf is awesome. I lost hours of my life to a random windsurf project from one o'clock to three o'clock in the morning earlier this week.
Go play with these things over the holiday break. You'll enjoy it. Even if you've never coded before.
Fraser Kelton: You know, one last thought that all of this that you just [00:54:00] raised has given me a reminder of is. A lot of people in the past, like three months have trotted out the Gartner Hype Cycle curve, and they're like, Hey, like we're in the trough of disillusionment, or whatever it is, and I will soon be up on the plateau of prosperity.
There's been like, honestly, like a 10 people who have raised that in the past couple of months and, and my feedback to them is like, I, I actually don't think of it that way at all. Like the, the Gartner Hype Cycle was. Invented in the 90s for desktop scanners, right, and webcams, where I think we are still, for the reasons that you just said, like the CEOs of these companies who are on the frontier of building the future in different fields, are still maybe underappreciating what this technology is going to do.
I think we are overall still not nearly. as excited as a society as to like what's going to happen on a 10 year time horizon. I think we're still underestimating it.
Nabeel Hyatt: I
Fraser Kelton: agree.
Nabeel Hyatt: [00:55:00] I'm sure I am too, right? It's one of those things where the number of zeros is too hard to totally internalize no matter how many times you stare at it kind of thing.
Fraser Kelton: Yeah, that's right. That's right. You and I should be better positioned than almost anybody to appreciate how crazy it's going to get. And I told you like five times in the past two months that I'm still underestimating how profound some of these products are going to be in our lives in a couple of years.
Yeah, I'm great. Well on that note, maybe that's that's how to wrap up.
Nabeel Hyatt: That's it for now We have plenty of time over the break to play with things I'm hoping I find three more releases next week to play with more AI things and think about the future and Also, maybe spend a little time with your family as well.
Like we can do a little bit of that as well
Fraser Kelton: I'm gonna be hitting the Apple summaries in my mail app
Take care, man. See ya