Django REST framework: 打造高效 API-流量限制、分頁與過濾

2024 iThome 鐵人賽

DAY 19

Software Development

Django 2024: 從入門到SaaS實戰系列第 23 篇

16th鐵人賽 django django-rest-framework

blank

2024-10-05 23:21:30

520 瀏覽

分享至

在客戶端透過API請求中，如果客戶端想要提升查找數據的精確性與效率？

而站在服務端的角度，透過認證與權限的設計來達到提升安全性的效果，那如果用戶的確具備對應的權限與身份，又有哪些方式能防止API被濫用？

又或是想要減輕服務端的壓力，都應該如何更近一步優化我們的API？

我們今天來探討在Django REST framework（DRF）中可以透過哪些方式來進行API的優化：流量限制、過濾與分頁

今日重點：

DRF中設置流量限制
- DRF中限流的基礎原理
- DRF中不同類別的限流實踐
- 配置限流
在DRF中使用分頁
過濾與排序

DRF中設置流量限制

所謂的限制流量（限流），也就是如果單一客戶端的請求頻率超過了允許的限制，便會給予限制，實務上通常會返回一個429 Too Many Requests響應

而在DRF中，會根據幾種方式來找到API請求的來源，來判定客戶端的請求是否超出限制：

AnonRateThrottle：應用於所有匿名用戶，根據客戶端的IP位址來追蹤請求
UserRateThrottle：應用於已認證的用戶。根據用戶ID來追蹤請求
ScopedRateThrottle：這種類型的限流允許你根據視圖，也就是API endpoint本身進行限流

DRF中限流的基礎原理

上述提到的DRF提供的限流類別，都是繼承了BaseThrottle ，我們來看一下他是具備什麼方法

class BaseThrottle:
    """
    Rate throttling of requests.
    """

    def allow_request(self, request, view):
        """
        Return `True` if the request should be allowed, `False` otherwise.
        """
        raise NotImplementedError('.allow_request() must be overridden')

    def get_ident(self, request):
        """
        Identify the machine making the request by parsing HTTP_X_FORWARDED_FOR
        if present and number of proxies is > 0. If not use all of
        HTTP_X_FORWARDED_FOR if it is available, if not use REMOTE_ADDR.
        """
        xff = request.META.get('HTTP_X_FORWARDED_FOR')
        remote_addr = request.META.get('REMOTE_ADDR')
        num_proxies = api_settings.NUM_PROXIES

        if num_proxies is not None:
            if num_proxies == 0 or xff is None:
                return remote_addr
            addrs = xff.split(',')
            client_addr = addrs[-min(num_proxies, len(addrs))]
            return client_addr.strip()

        return ''.join(xff.split()) if xff else remote_addr

    def wait(self):
        """
        Optionally, return a recommended number of seconds to wait before
        the next request.
        """
        return None

allow_request：決定是否允許請求，因為是抽象方法，在子類別中需要自定義相關邏輯。當我們想要自定義限流時通常會改寫這個方法
wait：返回客戶端在下一個請求前應該等待的時間，單位為秒
get_ident：默認情況下是去判斷請求的ip位址，並且根據是否有代理伺服器，盡可能準確辨認初始客戶端ip

這時候可能會想，那這樣DRF要怎麼記住每一個客戶端的資料？

這時候我們可以看SimpleRateThrottle類別，該類別繼承BaseThrottle，並且被AnonRateThrottle、UserRateThrottle、ScopedRateThrottle所繼承

class SimpleRateThrottle(BaseThrottle):
    """
    A simple cache implementation, that only requires `.get_cache_key()`
    to be overridden.

    The rate (requests / seconds) is set by a `rate` attribute on the Throttle
    class.  The attribute is a string of the form 'number_of_requests/period'.

    Period should be one of: ('s', 'sec', 'm', 'min', 'h', 'hour', 'd', 'day')

    Previous request information used for throttling is stored in the cache.
    """
    cache = default_cache
    timer = time.time
    cache_format = 'throttle_%(scope)s_%(ident)s'
    scope = None
    THROTTLE_RATES = api_settings.DEFAULT_THROTTLE_RATES

    def __init__(self):
        if not getattr(self, 'rate', None):
            self.rate = self.get_rate()
        self.num_requests, self.duration = self.parse_rate(self.rate)

    def get_cache_key(self, request, view):
        """
        Should return a unique cache-key which can be used for throttling.
        Must be overridden.

        May return `None` if the request should not be throttled.
        """
        raise NotImplementedError('.get_cache_key() must be overridden')

    def get_rate(self):
        """
        Determine the string representation of the allowed request rate.
        """
        if not getattr(self, 'scope', None):
            msg = ("You must set either `.scope` or `.rate` for '%s' throttle" %
                   self.__class__.__name__)
            raise ImproperlyConfigured(msg)

        try:
            return self.THROTTLE_RATES[self.scope]
        except KeyError:
            msg = "No default throttle rate set for '%s' scope" % self.scope
            raise ImproperlyConfigured(msg)

    def parse_rate(self, rate):
        """
        Given the request rate string, return a two tuple of:
        <allowed number of requests>, <period of time in seconds>
        """
        if rate is None:
            return (None, None)
        num, period = rate.split('/')
        num_requests = int(num)
        duration = {'s': 1, 'm': 60, 'h': 3600, 'd': 86400}[period[0]]
        return (num_requests, duration)

    def allow_request(self, request, view):
        """
        Implement the check to see if the request should be throttled.

        On success calls `throttle_success`.
        On failure calls `throttle_failure`.
        """
        if self.rate is None:
            return True

        self.key = self.get_cache_key(request, view)
        if self.key is None:
            return True

        self.history = self.cache.get(self.key, [])
        self.now = self.timer()

        # Drop any requests from the history which have now passed the
        # throttle duration
        while self.history and self.history[-1] <= self.now - self.duration:
            self.history.pop()
        if len(self.history) >= self.num_requests:
            return self.throttle_failure()
        return self.throttle_success()

    def throttle_success(self):
        """
        Inserts the current request's timestamp along with the key
        into the cache.
        """
        self.history.insert(0, self.now)
        self.cache.set(self.key, self.history, self.duration)
        return True

    def throttle_failure(self):
        """
        Called when a request to the API has failed due to throttling.
        """
        return False

    def wait(self):
        """
        Returns the recommended next request time in seconds.
        """
        if self.history:
            remaining_duration = self.duration - (self.now - self.history[-1])
        else:
            remaining_duration = self.duration

        available_requests = self.num_requests - len(self.history) + 1
        if available_requests <= 0:
            return None

        return remaining_duration / float(available_requests)

首先來看屬性的部分：
- cache與cache_format：這就能解釋我們前面的疑問，DRF是透過儲存快取來知道對應的庫戶端的相關資料
- scope：獲取設置的節流速率
- THROTTLE_RATES：從DRF設置中獲取的默認節流速率
__init__：如果沒設置scope，則透過get_rate方法拿到對應的速率，接著透過parse_rate方法，返回在特定的時間窗口內允許的請求數
allow_request：在這個核心的方法中，先把已經不再時間窗口（self.duration）內的紀錄刪除，再來判斷時間窗口內的請求數量是否超標
throttle_success：當允許請求時，調用該方法來紀錄客戶端資料到快取中

根據上面的方法我們了解到，我們需要先設置默認或是特定視圖中允許請求的頻率，當客戶端發起請求時，根據快取的資料以及設置的請求頻率來決定是否要接受請求。如果超標則不接受，反之則紀錄客戶端資料到快取

DRF中不同類別的限流實踐

現在我們知道基礎的現流實現方式，我們就可以來看各類別的區別

AnonRateThrottle

如果用戶有註冊過則不需進行限流，反之則紀錄客戶端ip

class AnonRateThrottle(SimpleRateThrottle):
    """
    Limits the rate of API calls that may be made by a anonymous users.

    The IP address of the request will be used as the unique cache key.
    """
    scope = 'anon'

    def get_cache_key(self, request, view):
        if request.user and request.user.is_authenticated:
            return None  # Only throttle unauthenticated requests.

        return self.cache_format % {
            'scope': self.scope,
            'ident': self.get_ident(request)
        }

UserRateThrottle

如果是註冊的用戶，使用用戶id當作快取的key，反之則紀錄ip位址

class UserRateThrottle(SimpleRateThrottle):
    """
    Limits the rate of API calls that may be made by a given user.

    The user id will be used as a unique cache key if the user is
    authenticated.  For anonymous requests, the IP address of the request will
    be used.
    """
    scope = 'user'

    def get_cache_key(self, request, view):
        if request.user and request.user.is_authenticated:
            ident = request.user.pk
        else:
            ident = self.get_ident(request)

        return self.cache_format % {
            'scope': self.scope,
            'ident': ident
        }

ScopedRateThrottle

比較不一樣的地方是，在初始化時不會去判斷限流頻率，而是在調用視圖時才進行紀錄，並且紀錄方式與UserRateThrottle相同

class ScopedRateThrottle(SimpleRateThrottle):
    """
    Limits the rate of API calls by different amounts for various parts of
    the API.  Any view that has the `throttle_scope` property set will be
    throttled.  The unique cache key will be generated by concatenating the
    user id of the request, and the scope of the view being accessed.
    """
    scope_attr = 'throttle_scope'

    def __init__(self):
        # Override the usual SimpleRateThrottle, because we can't determine
        # the rate until called by the view.
        pass

    def allow_request(self, request, view):
        # We can only determine the scope once we're called by the view.
        self.scope = getattr(view, self.scope_attr, None)

        # If a view does not have a `throttle_scope` always allow the request
        if not self.scope:
            return True

        # Determine the allowed request rate as we normally would during
        # the `__init__` call.
        self.rate = self.get_rate()
        self.num_requests, self.duration = self.parse_rate(self.rate)

        # We can now proceed as normal.
        return super().allow_request(request, view)

    def get_cache_key(self, request, view):
        """
        If `view.throttle_scope` is not set, don't apply this throttle.

        Otherwise generate the unique cache key by concatenating the user id
        with the `.throttle_scope` property of the view.
        """
        if request.user and request.user.is_authenticated:
            ident = request.user.pk
        else:
            ident = self.get_ident(request)

        return self.cache_format % {
            'scope': self.scope,
            'ident': ident
        }

配置限流

如果要設置限流，可以直接在settings.py中配置全局變數，設置想要配置的限流類別與頻率

REST_FRAMEWORK = {
    'DEFAULT_THROTTLE_CLASSES': [
        'rest_framework.throttling.AnonRateThrottle',
        'rest_framework.throttling.UserRateThrottle',

    ],
    'DEFAULT_THROTTLE_RATES': {
        'anon': '2/min',
        'user': '10/min'
    }
}

在視圖中便可以直接調用

from rest_framework.throttling import UserRateThrottle

class WorkspaceViewSet(viewsets.ModelViewSet):
    throttle_classes = [UserRateThrottle]

那如果我們想要修改全局的頻率，可以新建throttles.py在app目錄下

from rest_framework.throttling import UserRateThrottle, ScopedRateThrottle

class WorkspaceThrottle(UserRateThrottle):
    THROTTLE_RATES = {"user": "5/day"}
    
# 如果是要配置scope的話，可以這樣寫
class DocumentThrottle(ScopedRateThrottle):
	  THROTTLE_RATES = {"document": "10/day"}

最後在視圖中調用即可

在DRF中使用分頁

分頁在API中扮演重要的角色：

性能優化：通過限制每次請求返回的資料量來減輕服務端的負載壓力，也能提升用戶體驗
資源管理：在處理大量資料時能夠更有效的管理伺服器資源
靈活性：可以支持客戶端根據需求來響應資料量

在DRF中有幾種類

PageNumberPagination：可以根據頁碼來請求特定的資料頁
LimitOffsetPagination：讓客戶端可以指定返回的每頁的資料量與起始位置（offset）
CursorPagination：加密分頁器，客戶端不能跳到任意位置，只提供前進與後退控制。但是必須要模型中有created欄位

使用的方法可以全局設定

REST_FRAMEWORK = {
    'DEFAULT_PAGINATION_CLASS': 'rest_framework.pagination.PageNumberPagination',
    'PAGE_SIZE': 10
}

或是自定義類別來修改參數

from rest_framework.pagination import LimitOffsetPagination

class CustomPagination(LimitOffsetPagination):
    default_limit = 10
    max_limit = 100
    
    
class WorkspaceViewSet(viewsets.ModelViewSet):
    pagination_class = CustomPagination

而請求時則是直接調用想要的資料範圍

http://127.0.0.1:8000/note/workspaces/?page=1 # PageNumberPagination
http://127.0.0.1:8000/note/workspaces/?limit=1&offset=0 # LimitOffsetPagination

過濾與排序

實現過濾的功能，在API請求時也相當重要，除了提高效率之外，也能改善用戶體驗，使客戶端能快速找到所需的資訊。同時能夠直接在調用API時根據特定欄位來進行排序，也能進一步提升客戶端體驗

如果使用DRF原生方法進行過濾，可以分成視圖函式與視圖類別：

視圖函式可以根據self.request.query_params.get() 拿到客戶端的查詢關鍵字參數
視圖類別則是之前就介紹過的改寫get_queryset方法，在Django REST framework: 視圖的進化之旅 - GenericAPI 到 ViewSet，從通用基礎到高層抽象中已經有提及過，這邊就不再贅述

此外DRF還提供了不同的API來快速使用過濾功能

SearchFilter：

from rest_framework import filters

class DocumentViewSet(viewsets.ModelViewSet):
    queryset = Document.objects.all()
    serializer_class = DocumentSerializer
    permission_classes = [permissions.IsAuthenticated]
    filter_backends = [filters.SearchFilter]
    search_fields = ["title", "content"]

在視圖類別中可以設定filter_backends屬性為filters.SearchFilter
search_fields中通常我們會選擇具備text type的欄位進行搜尋
如果是JSON欄位，可以透過雙底線來進行資料的解構

# 來源於官方文檔
search_fields = ['data__breed', 'data__owner__other_pets__0__name']

當我們在視圖中配置filter時，我們就能透過路由來對特定來為進行搜尋

此外我們也能根據外鍵或是多對多的欄位進行搜索，例如以我們的Document model來說，我們也能透過workspace欄位拿到owner的名稱

class DocumentViewSet(viewsets.ModelViewSet):
    queryset = Document.objects.all()
    serializer_class = DocumentSerializer
    permission_classes = [permissions.IsAuthenticated]
    filter_backends = [filters.SearchFilter]
    search_fields = ["workspace__owner__username"]# 透過雙底線來解構

而在排序方面，DRF則是提供OrderingFilter類別快速設定排序功能

class DocumentViewSet(viewsets.ModelViewSet):
    queryset = Document.objects.all()
    serializer_class = DocumentSerializer
    permission_classes = [permissions.IsAuthenticated]
    filter_backends = [filters.SearchFilter, filters.OrderingFilter] # 一起配置不衝突
    search_fields = ["workspace__owner__username"]
    ordering_fields = ["created_at", "updated_at", "id"] # 設定想要排序的欄位

路由配置

# 正冪排序
http://127.0.0.1:8000/note/documents/?ordering=created_at

# 降冪排序
http://127.0.0.1:8000/note/documents/?ordering=-created_at

# 多個欄位排序時，可以根據順序
http://127.0.0.1:8000/note/documents/?ordering=-id,created_at