[DevOps] System Design assignment

Hello, I’m WONIZZ, the developer of organizing things. In this session, we are going to organize the contents related to the architecture required by many companies.
This is a summary of arhitting for url shortner, which usually requires a lot.

I wrote the contents that I actually did the assignment myself, and I can learn many things while doing the assignment, so I post it to record it on my blog.

1. Traffic and System Capacity

Traffic

  • Assuming 200:1 read/write ratio
  • Number of unique shortened links generated per month = 100 million
  • Number of unique shortened links generated per seconds = 100 million /(30 days * 24 hours * 3600 seconds ) ~ 40 URLs/second
  • With 200:1 read/write ratio, number of redirections = 40 URLs/s * 200 = 8000 URLs/s

Storage

Assuming lifetime of service to be 50 years and with 100 million shortened links creation per month

  • total number of data points/objects in system will be = 100 million/month * 50 (years) * 12 (months) = 60 billion

Assuming size of each data object (Short url, long url, created date etc.) to be 500 bytes long

  • total require storage = 60 billion * 500 bytes =30TB

Memory

Following Pareto Principle, better known as the 80:20 rule for caching. (80% requests are for 20% data)

Since we get 8000 read/redirection requests per second, we will be getting 700 million requests per day:

  • 8000/s * 86400 seconds =~700 million

To cache 20% of these requests, we will need ~70GB of memory.

  • 0.2 * 700 million * 500 bytes = ~70GB

Estimates

Category Estimation
Shortened URLs generated 40/s
Shortened URLs generated in 50 years 60 billion
Total URL requests/redirections 8000/s
Storage for 100 years 30TB
Cache memory 70GB

2. Pseudo Code

Java cod for generating 7 characters long tiny url with a counter

base 62 are [0–9][a-z][A-Z]

URL with length 5, will give 62⁵ = ~916 Million URLs
URL with length 6, will give 62⁶ = ~56 Billion URLs
URL with length 7, will give 62⁷ = ~3500 Billion URLs

Since we required to produce 120 billion URLs, with 7 characters in base62 we will get ~3500 Billion URLs. Hence each of tiny url generated will have 7 characters

public class Codec {
Long counter = 1L;
Map<Long, String> indexToUrl = new HashMap<Long, String>();
Map<String, Long> urlToIndex = new HashMap<String, Long>();
String base62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

  // Encodes a URL to a shortened URL.
  public String encode(String longUrl) {
      if (urlToIndex.containsKey(longUrl)) {
          return "http://tinyurl.com/"+base62Encode(urlToIndex.get(longUrl));
      }
      else {
          indexToUrl.put(counter, longUrl);
          urlToIndex.put(longUrl, counter);
          counter++;
          return "http://tinyurl.com/"+base62Encode(urlToIndex.get(longUrl));
      }
  }

  // Decodes a shortened URL to its original URL.
  public String decode(String shortUrl) {
      String base62Encoded = shortUrl.substring(shortUrl.lastIndexOf("/") + 1);
      long decode = 0;
      for(int i = 0; i < base62Encoded.length(); i++) {
          decode = decode * 62 + base62.indexOf("" + base62Encoded.charAt(i));
      }
      return indexToUrl.get(decode);
  }

  private String base62Encode(long value) {
      StringBuilder sb = new StringBuilder();
      while (value != 0) {
          sb.append(base62.charAt((int)(value % 62)));
          value /= 62;
      }
      while (sb.length() < 6) {
          sb.append(0);
      }
      return sb.reverse().toString();
  }
}

start with a counter (A Large number 100000000000 in base 10 which 1L9zO9O in base 62) and increment counter every-time we get request for new short url

3. Database Schema

We can use a relational database as our backend store but since we don’t have joins between objects and we require huge storage size (60TB) along with high writes (40 URL’s per seconds) and read (8000/s) speed, a NoSQL store (MongoDB, Cassandra etc.) will be better choice.

Data Related to user

Name Type Desc
User ID Int A unique user id or API key to make user globally distinguishable
Name Varchar The name of the user
Email Varchar The email id of the user
Creation Date Datetime The date on which the user was registered

Data Related to ShortLink

Name Type Desc
Short URL Varchar 7 character long unique short URL
Original URL Varchar The original long URL
User ID Int The unique user id or API key of the user who created the short URL

We can use a relational database as our backend store but since we don’t have joins between objects and we require huge storage size (60TB) along with high writes (40 URL’s per seconds) and read (8000/s) speed, a NoSQL store (AWS DynamoDB, Cassandra etc.) will be better choice.

4. Key Generation Service (KGS)

a standalone Key Generation Service (KGS) that generates random seven-letter strings beforehand and stores them in a databas
Whenever we want to shorten a URL, we will take one of the already-generated keys and use it. This approach will make things quite simple and fast.

concurrency problem solution

As soon as KGS gives keys to one of the servers, it can move them to the used keys table. KGS can always keep some keys in memory to quickly provide them whenever a server needs them.

SPOF of KGS problem solution

To solve this, we can have a standby replica of KGS. Whenever the primary server dies, the standby server can take over to generate and provide keys.

5. Cache

We can use some off-the-shelf solution like Memchached, Before hitting backend storage, the application servers can quickly check if the cache has the desired URL.
As estimated above, we need 70GB memory to cache 20% of daily traffic. Since a modern-day server can have 256GB memory, we can easily fit all the cache into one machine.

Whenever there is a cache miss, our servers would be hitting a backend database. Whenever this happens, we can update the cache and pass the new entry to all the cache replicas

cache policy

Least Recently Used (LRU) can be a reasonable policy for our system. Under this policy, we discard the least recently used URL first.

To further increase the efficiency, we can replicate our caching servers to distribute the load between them.

6. Load Balancer ( LB )

  1. Between Clients and Application servers
  2. Between Application Servers and database servers ( AWS Managed )
  3. Between Application Servers and Cache servers ( AWS Managed )

Pros of LB

  • Balancing of Traffic( distributes incoming requests equally among backend servers. )
  • if a server is dead, LB will take it out of the rotation and will stop sending any traffic to it.
  • If a server is overloaded or slow, the LB will not stop sending new requests to that server

7. System Design

system_design_for_urlshortner system_design_for_urlshortner[/caption]

8. Conclusion

While carrying out this assignment, I learned a lot of things to consider in various system design situations.
In particular, I was able to learn about the concerns and the reasons for the design, such as rough traffic requirements, cache layer, and LB selection.

답글 남기기

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다