In the PC era, when a user asks for a page, we will first check whether a SessionId exists in the user’s cookie, and if it does not exist, we will generate a SessionId by random number and store it in the cookie. If it exists, we will update the expiration time of this cookie (after 30 minutes). That is, as long as the user visits the interval within 30 minutes is considered the same Seesion, more than 30 minutes will generate a new SeesionId and define browsing as a new Session.

In APP era or applet era, we usually record each opening of the app as a Seesion, and the concept of cookie is abandoned, but an important issue is ignored in the middle: in the process of using the app or applet, it is very likely to be interrupted by other applications, such as phone, SMS, WeChat, push, etc. When the user cut the screen after Session records will be changed, the statistics of Session data at this time is often inaccurate. Today we are going to share how to define Session by means of SQL.

Clear thinking

In user browsing logs, we are usually able to record the user’s identity and access time. In the Session definition we first need to identify the unique user and sort the logs by the user’s browsing time, and after the processing is completed we need to calculate the time difference between the logs and locate the browsing greater than 30 minutes as a new Session.

Logic to SQL

1 Calculate the last visit time for each visit

This can be achieved using the window function LAG. The specific code is as follows.

1
2
SELECT *, LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event
FROM tutorial.playbook_events

The execution effect is as follows.

Where.

  • user_id: user identity ID
  • occurred_at: current access time
  • last_event: time of last access

2 Calculate the access time difference to determine if it is a new access

Determining whether a visit is new or not is done in two main ways.

  • This visit does not have the time of the last visit
  • the time difference between this visit and the last visit > 30 minutes (30 minutes or 10 minutes can be defined according to the business scenario)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT *
    , CASE 
        WHEN unix_timestamp(occurred_at, 'yyyy-MM-dd HH:mm:ss') - unix_timestamp(last_event, 'yyyy-MM-dd HH:mm:ss') >= 60 * 30
        OR last_event IS NULL THEN 1
        ELSE 0
    END AS is_new_session
FROM (
    SELECT *, LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event
    FROM tutorial.playbook_events
) t1

The execution effect is as follows.

2 Define SessionId

With the above data, there is still a SessionId missing, and the way to achieve it is still through the window function. The specific code is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT user_id, occurred_at, SUM(is_new_session) OVER (ORDER BY user_id, occurred_at ROWS BETWEEN unbounded PRECEDING AND CURRENT ROW) AS global_session_id
    , SUM(is_new_session) OVER (PARTITION BY user_id ORDER BY occurred_at) AS user_session_id
FROM (
    SELECT *
        , CASE 
            WHEN unix_timestamp(occurred_at, 'yyyy-MM-dd HH:mm:ss') - unix_timestamp(last_event, 'yyyy-MM-dd HH:mm:ss') >= 60 * 30
            OR last_event IS NULL THEN 1
            ELSE 0
        END AS is_new_session
    FROM (
        SELECT *, LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event
        FROM tutorial.playbook_events
    ) t1
) t2

The execution effect is as follows.