Quora’s architecture has always been interesting because it tries to balance speed, live updates, and a fairly practical technology stack. If you’ve looked at how Stack Exchange or Facebook built their systems, Quora sits somewhere in the same conversation: not flashy for the sake of it, but very deliberate about what goes where.
Search is limited, but fast
Quora’s search is narrower than a full web search engine. It can search questions, topic tags, user names, and topic titles. It does not offer full-text search, so you cannot search inside the body of questions or answers.
The search behavior is also very responsive. It uses prefix matching, so typing mi can immediately surface something like Microsoft. There is also a simple fuzzy-matching layer, but nothing especially advanced. If two questions are duplicates, one may be redirected to the other, though both can still appear in search results. There is no spell checker.
At first, Quora used Sphinx, an open-source search server that could handle this kind of behavior. Later, they moved away from it because it felt limiting and replaced it with a newer Python-based solution.
Real-time search and the long-lived connection behind it
Quora’s search requests are sent over AJAX as GET requests, and the response comes back in JSON. The JSON is parsed on the server side rather than in browser JavaScript. One likely reason is keyword highlighting: that is easier to manage on the server than entirely on the client.
The live search feels almost aggressive. If you type Microsoft, Quora may send nine backend queries—one for each character you enter. It does this even if you type quickly. The important part is that the backend decides which requests actually matter, so the front end generating many requests does not automatically mean a proportional spike in server load.
Quora also keeps the search connection open with HTTP keep-alive. Once you start typing, the connection is established and reused for the next search request unless you sit idle for 60 seconds.
Webnode2 and LiveNode
Inside Quora, Webnode2 and LiveNode handle content management. Webnode2 is responsible for generating HTML, CSS, and JavaScript, and it is tightly coupled with LiveNode. In practice, Webnode2 focuses on how content is rendered on the page, while LiveNode handles dynamic updates.
Charlie Cheever once said that if he could start over, the first thing he would do would be to rewrite LiveNode.
The interesting thing about LiveNode is that it makes the page more shared than people might expect. If user A and user B are looking at the same question at the same time, an action by A can affect B’s screen. For example, if A upvotes an answer and that answer moves upward, B may see the change through an AJAX update. If B has comments expanded at that moment, the layout can shift under them.
LiveNode is built with Python, C++, and JavaScript, with jQuery and Cython also in the mix. Quora has wanted to open-source it, but separating the code cleanly would take a great deal of work.
AWS everywhere
Quora runs entirely on Amazon Web Services, using EC2 and S3. For a fast-growing startup, that matters a lot because it avoids the cost and maintenance burden of building and operating its own data center. Their operating system is Ubuntu Linux, which makes deployment and management easier.
Static assets are delivered through Amazon CloudFront. That includes static images, CSS, and JavaScript. Images are first uploaded to EC2, processed with the Python S3 API, and then pushed to S3.
Load balancing and the request chain
The front end uses HAProxy for load balancing. Behind that sits Nginx as a reverse proxy, and behind Nginx are Pylons (Pylons + Paste), which handle dynamic web requests.
Pylons is a lightweight web framework often used behind Nginx. Quora did not use it in the standard way. They took the template and ORM pieces out of Pylons and replaced them with their own Python code. That custom layer is where LiveNode and Webnode2 live.
Why Python won
Charlie and Adam, both from Facebook, chose Python instead of PHP. Adam’s view was blunt: Facebook uses PHP largely for legacy reasons, not because it is the best choice today. They also avoided C#, since that would bring in a Microsoft-heavy stack, and Java did not feel as convenient for writing code quickly. Scala was still too young at the time. Ruby looked similar to Python, but they did not have enough experience with it.
So Python won. They also knew its weaknesses: speed and performance. Where performance mattered, they used C and C++. The version they used was Python 2.6.
Another reason Python fit Quora was the way its data structures map cleanly to JSON. The code is readable, there are plenty of libraries, debuggers, and reloaders, and Quora’s browser/server communication is almost entirely JSON-based.
They did not use an IDE. Emacs was the default choice, which sounds like a personal preference more than a company-wide mandate. As the engineering team grew, that likely changed.
They also mentioned PyPy, which promises a faster and more flexible Python runtime.
Thrift between backend services
Thrift is used for communication between backend servers, and the Thrift service itself is written in C++. Facebook uses the same technology.
Tornado for real-time updates
Tornado powers the real-time side of the system. It runs on a Comet server and is used to handle many long-polling and push-style connections.
Long polling instead of constant polling
Quora pages are not just static views. Pages need to update, and users can create questions, answers, and comments. That is why Quora uses long polling rather than traditional polling.
Traditional polling is the familiar loop of the browser asking, "Any updates?" The server says no. A little later the browser asks again. And again. The problem is obvious: the client is deciding when to ask, even though only the server knows when something has actually changed. At scale, all those repeated checks become expensive.
Long polling, often called Comet, flips that logic. The client waits, and the server decides when to respond. The connection can stay open for a long time—say, 60 seconds—while the server watches for updates. If something changes, it sends the data back to the browser. If not, the client simply tries again later.
The benefit is fewer back-and-forth requests. The server controls the timing, so updates can arrive in a few milliseconds or after several seconds. It can also batch a group of changes and send them at once, which is more efficient.
The downside is the number of open TCP connections. For a service with millions of users, even 10% of them online at once can mean support for around 100,000 concurrent connections. If one user has multiple Quora pages open in the browser, the load climbs even faster.
Still, long polling is not doomed by design. Technologies built for this pattern can keep memory use very low for idle connections. Nginx is a good example: it is a single-threaded, event-driven server that uses very little memory per connection, and each Nginx process handles one connection at a time. That makes it relatively easy to scale into a system that can deal with huge concurrency.
MySQL at the center
Like Facebook, Quora relies heavily on MySQL. The biggest challenge is partitioning the data. Their basic rule is to keep data on one machine whenever possible, and then use hashed primary keys to spread large datasets across multiple databases when needed. They avoid joins whenever they can.
Adam has pointed to FriendFeed’s approach to storing schema-less data in MySQL and has also argued that you should not reach for NoSQL before your community has even hit one million users.
Quora is hardly alone here. Google, Twitter, Facebook, and FriendFeed all use MySQL in one form or another.
Caching, version control, and page speed
Memcached sits in front of MySQL as a cache layer.
Git is the source control system.
If you inspect Quora’s page source, you will notice that JavaScript is placed at the very end of the page. The idea is simple: show content first, then load the scripts. That makes the page feel faster.
Charlie Cheever also pointed to Steve Souders’ guidance on fast-loading sites as an influence on Quora’s performance work. The rule set includes ideas like reducing HTTP requests, using a CDN, setting expiration headers, compressing components, putting stylesheets near the top, scripts near the bottom, and avoiding duplicate or unnecessary work.


- Make Fewer HTTP Requests
- Use a Content Delivery Network
- Add an Expires Header
- Gzip Components
- Put Stylesheets at the Top
- Put Scripts at the Bottom
- Avoid CSS Expressions
- Make JavaScript and CSS External
- Reduce DNS Lookups
- Minify JavaScript
- Avoid Redirects
- Remove Duplicate Scripts
- Configure ETags
- Make AJAX Cacheable