System Design
Bit.ly
Bit.ly is a URL shortener. This is a pretty common beginner systems question.
Steps for Designing
- Functional Requirements: What features must the system have to meet the needs of the user.
- Core Requirements:
- Users should be able to submit a long URL and receive a short one
- Users should be able to access the original URL from the short one (Forwarding)
- Non-Functional Requirements: Features that refer to how the system operates and provides the functional features.
- Short URLs should be unique
- The redirect delay should be minimal
- High availability (99.99% >)
- The system should scale to 1 billion URLs and 100M DAU (Daily Active Users)
Setup
Core Entities -- What Are They?
Core entities represent the primary objects in our system. These are derived from our requirements. Examples can include "URL", "User", "Transaction" and so on. They often map directly to database tables but can also represent more abstract concepts as well.
When Are Entities Tables in a Database?
In a relatoional database design, more core entities will have corresponding tables.
- Each table normally represents one entity (e.g, a "Users" table for the User entity) Some entities may not need direct table representation:
- Derived or computed entities (e.g, an aggregated click count)
- Temporary or in-memory entities used for processing only.
When Are They Not Tables?
- NoSQL Databases
- Microservices
- When Aggregating Data
The core entities for the Bit.ly URL shortener are:
- Original URL: The URL from the user
- Short URL: The shortened processes URL that is sent to the user and mapped to the original URL for forwarding
- User: The user who created the shortened URL
API
What Is It?
The API is the contract between the client and the server. How we move data from client to server and vice versa. There are many different types of APIs, but we will use REST and the HTTP methods.
(CRUD)
- POST: Create
- GET: Read
- PUT: Update
- DELETE: Delete
Now before the APIs are built, we should consider the services offered and create a separation of concerns. There are actually two services being offered, a URL shortener and a forwarding service. One is incredibly reliant on the other.
Shortening The URL POST Endpoint
This API endpoint will take in the long URL as well as a custom alias and expiration date.
// URL POST
{
"long_url": "https://example.com/some/long/ass/path",
"alias": "short_alias",
"exp_data": "optional_expiration_data"
}
->
{
"short_url": "http://short.ly/abc"
}
Redirection
//Redirect to original URL
GET /{short_code}
-> HTTP 302 Redirect to the original URL recieved from the user.
High-Level Design
We start the design by going one-by-one through our functional requirements and designing single systems to meet them.
URL Shortener (POST)
The URL shortener core requirement should take a POST request from the user, compute a shortened URL (optional alias), and then store the record in the database.
- User: Interacts with the system via an API enpoint
- Server: Receives and processes the request from the client or user and handles all the logic like shortening the URL and validating it to already created URLs.
- Database: Stores the map of short codes to long URLs, along with the aliases and expiration dates.
When the system recieves a POST request from the user:
- The server recieves and validates the URL:
- Use an opensource library to validate the the long URL
- Queery the database to see if the long URL is already being forwarded from (record already exists)
- If the URL is valid and is not already in our database we generate a short URL and store in our database:
- Finally, we can return the short URL to our user.
Acess Original URL Via Short URL (GET)
Users should be able to access the original URL from the shortened URL.
When the system recieves a GET request from the user with a shortened URL:
- The server will lookup the short URL and verify that there is a match and it has not expired.
- If the URL is valid and has not expired, the server will respond with a 302 redirect what will point to the original long URL.
Some Scalability and Deep Dives
URL Uniqueness
I would imaging that using a hashing function would work. Adding a hash feature to the URL entity could make chaining possible. Especially if we use SHA-256 for optimal number of hashes.
Column Name | Data Type |
---|---|
URL | VARCHAR(2048) |
shortURL | VARCHAR(255) |
hash | VARCHAR(64) |
created | TIMESTAMP |
expiration | TIMESTAMP |
createdBy | VARCHAR(255) |
The next entry would have to read the previous entry's hash then incoporate that into computing its hash. This would create a chain. We could also add an authentiction server that stores hashes in a hashmap that can be quickly searched for verification purposes.
Scale to 1B Shortened URLs and 100M DAU
Scaling can be done simply. We can have a separation of concerns from the URL shortener service and the forwarding service. We can assume that less links will be made than they will be searched since 1 user can shorten a link and any one can use it to get to the original URL.
Scaling horizontally will make it easier if we separate our services out to different servers and architecture.