0.Index

1.PID(Process ID)
- 1.1 ??
- 1.2 ??
2.PID 할당하기
- 2.1 alloc_pid()
  - ??

1.PID(Process ID)

시스템에는 하나 이상의 태스크가 존재한다. 각 태스크를 구분하기 위해 사용하는 ID가 필요한 이유이다. 리눅스에서는 PID가 이런 역할을 담당하고 있다.PID는 태스크가 생성될 때 할당되며 태스크가 종료할때 해제된다.

태스크의 PID를 관리하기 위해 여러가지 자료구조가 유지된다.

리눅스커널에서 PID는 정수형 값(pid_t)과 구조체(struct pid)를 가리키며 혼용해서 쓰인다. 다만 userspace관점에서는 정수형 값을 가리킬 때 쓰인다.

2.PID 할당하기

pid 구조체 할당은 태스크를 생성하는 과정에서 이루어진다. 이때 사용되는 함수는 alloc_pid()이다. 이 함수는 copy_process()에서만 호출된다.

copy_process()
-> pid = alloc_pid()                      [1]
-> p->pid = pid_nr(pid)                   [2]
-> init_task_pid(p, PIDTYPE_PID, pid)     [3]
-> attach_pid(p, PIDTYPE_PID)             [4]

pid_t형 pid는 alloc_pid()에서 할당받은 pid구조체의 numbers 필드를 참고해서 [2]에서 설정된다. pid구조체의 numbers필드는 upid구조체형 변수이다. 지금은 pidmap에서 할당받은 정수형 pid를 담고있는 변수라고만 알고 넘어가자.

static inline pid_t pid_nr(struct pid *pid)
{
        pid_t nr = 0;
        if (pid)
                nr = pid->numbers[0].nr;
        return nr; 
}

정수형 pid는 task_struct의 pid 필드를 통해 유지된다.

struct task_struct{
      ...
      pid_t pid;
      ...
}

태스크가 할당받은 pid구조체는 task_struct 구조체의 pids[]배열을 통해 연결된다. 이 작업은 init_task_pid()를 통해 이루어진다.

pid구조체 관점에서도 자신을 사용하는 태스크를 list로 관리한다. 이 작업은 attach_pid()를 통해 이루어진다.

alloc_pid() - 새로 생성된 태스크를 위한 PID 할당

kernel/pid.c의 alloc_pid() 1/3

struct pid *alloc_pid(struct pid_namespace *ns)
{
        struct pid *pid;
        enum pid_type type;
        int i, nr; 
        struct pid_namespace *tmp;
        struct upid *upid;
        int retval = -ENOMEM;

        pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
        if (!pid)
                return ERR_PTR(retval);

        tmp = ns; 
        pid->level = ns->level;
        for (i = ns->level; i >= 0; i--) {
                nr = alloc_pidmap(tmp);
                if (IS_ERR_VALUE(nr)) {
                        retval = nr; 
                        goto out_free;
                }   
                pid->numbers[i].nr = nr; 
                pid->numbers[i].ns = tmp;
                tmp = tmp->parent;
        }

pid구조체는 미리 생성된 slab cache에서 할당받는다. 다만 slab cache는 네임스페이스마다 개별적으로 관리된다. 지금은 시스템에 하나의 네임스페이스(init_pid_ns)만 존재한다고 가정하자. 이 slab cache는 pidmap_init()에서 생성된다.

void __init pidmap_init(void)
{
        ...
        init_pid_ns.pid_cachep = KMEM_CACHE(pid,
                        SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);

pid 구조체를 할당받은 다음에는 pidmap이라는 비트맵 자료구조에서 정수형 pid를 할당받는다. pidmap은 네임스페이스마다 개별적으로 관리된다. pidmap 구조체는 매우 단순하다.

비트맵으로 사용할 페이지를 가리킬 포인터
비트맵에 남은 비트수(정수형 pid 갯수)를 저장할 변수

init_pid_ns의 pidmap은 컴파일타임에 모든 엔트리가 nr_free는 32768, 페이지를 가리키는 포인터 page는 NULL로 초기화된다. 부팅타임에는 pidmap_init()에서 하나의 페이지를 할당받고 부팅을 진행중인 idle thread를 위해 정수형 pid 0을 reserve하며 nr_free를 1 감소시킨다.

어쨌든 aloc_pidmap에서 정수형 pid를 할당받으면 이 값은 pid구조체의 upid에 설정된다. 할당받고 설정하는 과정을 네임스페이스의 계층구조를 위로 올라가면서 반복한다. 지금은 시스템에 네임스페이스가 하나밖에 없다고 가정하겠다.

kernel/pid.c의 alloc_pid() 2/3

        if (unlikely(is_child_reaper(pid))) {             [1]
                if (pid_ns_prepare_proc(ns))
                        goto out_free;
        }

        get_pid_ns(ns);
        atomic_set(&pid->count, 1); 
        for (type = 0; type < PIDTYPE_MAX; ++type)             [2]
                INIT_HLIST_HEAD(&pid->tasks[type]);

        upid = pid->numbers + ns->level;             [3]
        spin_lock_irq(&pidmap_lock);
        if (!(ns->nr_hashed & PIDNS_HASH_ADDING))
                goto out_unlock;
        for ( ; upid >= pid->numbers; --upid) {             [4]
                hlist_add_head_rcu(&upid->pid_chain,
                                &pid_hash[pid_hashfn(upid->nr, upid->ns)]);
                upid->ns->nr_hashed++;
        }
        spin_unlock_irq(&pidmap_lock);
        return pid;

코드블럭 1에서는 만약 init process(child reaper)를 위한 pid라면 pid_ns_prepare_proc()을 호출한 뒤 나간다.

태스크는 자신과 연관된 pid구조체를 가리키는 필드를 유지하고 있다. pid 구조체 역시 자신과 연관된 태스크리스트 tasks[] 필드를 통해 유지하고 있다. 코드블럭 2에서는 이 tasks[] 구조체를 초기화해준다. pid구조체에 태스크를 연결하는 작업은 나중에 호출되는 attach_pid()에 의해 수행된다.

pid구조체는 자신이 특정네임스페이스에서 표현되는 정수형 pid를 upid에 가지고 있다. 바로 numbers라는 필드다.

struct pid
{
        ...
        struct upid numbers[1];

코드블럭 3에서는 alloc_pid()를 호출할 때 인자로 넘긴 네임스페이스레벨(ns->level)을 사용해서 해당 네임스페이스와 매핑된 upid를 찾는다.

upid는

out_unlock:
        spin_unlock_irq(&pidmap_lock);
        put_pid_ns(ns);

out_free:
        while (++i <= ns->level)
                free_pidmap(pid->numbers + i); 

        kmem_cache_free(ns->pid_cachep, pid);
        return ERR_PTR(retval);
}

태스크가 새로 생성되면 태스크를 위한 PID를 할당받아야 한다. 이때 사용하는게 alloc_pid() 함수이다. 함수가 하는 일은 아래와 같다.

alloc pid struct
alloc pid nr from pidmap
set pid->count as 1
link upid to pidhash

함수는 최종적으로 pid 구조체변수를 리턴해준다. 구조체 하나를 할당받은 뒤에 pid number를 pidmap에서 할당받는다. 할당받은 pid number는 pid 구조체에 설정된다. 할당받은 pid 구조체는 이제 태스크에 의해 사용되므로 count를 1로 설정해준다음 리턴한다.

pid 구조체는 slab cache에서 할당받는데, slab cache는 시스템 전역적이지 않고 namespace단위로 관리된다. 추가로 namespace를 생성하지 않았다면 모든 프로세스는 init namespace를 사용할 것이다. 따라서 slab cache도 init namespace의 것을 사용한다.

pid_nr() - global pid number를 찾기

우리가 알고 있는 정수형 값인 태스크의 PID는 task_struct 구조체의 int형 변수인 멤버변수 pid가 저장하고 있다.

struct task_struct{
      ...
      pid_t pid;
      ...
}

그리고 이 변수에 저장된 값은 global PID라고 부른다. pid는 global PID, virtual PID로 나눌 수 있다.

두 pid에 대한 설명을 하기 전에 pid namespace에 대한 설명이 먼저 필요하다. 시스템이 사용할 수 있는 최대 PID 번호가 32678이라면 시스템에서 사용가능한 PID범위는 0 ~ 32678이 된다. 시스템의 특정 PID는 할당받은 태스크만 사용할 수 있고 유일하다. 하지만 pid namespace가 도입되면서 pid number space가 1개이상 존재할 수 있게 되었다. 결과적으로 특정 pid namespace에서 이미 사용된 PID라고 할지라도 pid namepsace가 다르다면 해당 PID number를 또 쓸 수 있게 된 것이다.

pid namespace는 계층구조를 가질 수 있다. 시스템이 부팅되면서 기본적으로 생성되는 init pid namespace가 있고 그 후에 추가되는 namespace는 init pid namespace과 계층구조를 이루게 된다.

계층구조에서 자식인 특정 pid namepsace에서 PID number를 하나 할당받으면 부모 pid namespace에서도 PID number를 할당받는다. 이 과정은 init pid namspace를 만날때까지 계속 반복된다.

pid 구조체는 한개를 할당받지만 PID number는 여러개 할당받을 수 있다는 얘기임..

예를 들어 child namespace에서 PID number가 하나 할당되면 부모 pid namespace에서도 pid가 할당된다. 따라서 특정 pid namespace가 자식 pid namespace를 2개 가지고 있고 자식들이 각각 2개의 PID number를 할당했다면 부모도 총 4개의 pid를 할당하게 된다.

이런 원리로 부모입장에서는 자식 pid namespace에서 생성된 태스크들이 보이지만 자식들은 서로의 태스크를 볼 수 없고 부모의 것도 볼 수 없다.

참고로 요즘 핫한 docker에서 쓰였던 기술인 linux conatiner는 namespace를 활용했다.

kernel/pid.c

struct pid *alloc_pid(struct pid_namespace *ns)
{
        for (i = ns->level; i >= 0; i--) {
                nr = alloc_pidmap(tmp);
                ...
                pid->numbers[i].nr = nr;
                pid->numbers[i].ns = tmp;
                tmp = tmp->parent;
        }
        ...
}

pid할당을 시작한 @ns부터 시작해서 계층구조의 init(ns->level == 0)까지 순회하면서 PID number를 pidmap에서 할당받는다. 이렇게 할당받은 정수값의 PID는 pid구조체의 numbers[].nr에 설정된다.

global PID는 여기에서 init pid namespace에게서 할당받은 PID number를 나타낸다. pid_nr()의 구현을 보면 pid->numbers[0].nr을 리턴한다. init pid namespace의 level이 0이기때문이다.

include/linux/pid.h

static inline pid_t pid_nr(struct pid *pid)
{
        pid_t nr = 0;
        if (pid)
                nr = pid->numbers[0].nr;
        return nr; 
}

어떤 시점의 pid namespace에서 바라보냐에 따라 아래 두가지 pid로 나뉜다.

global pid
- init namespace에서 바라볼 때의 pid
virtual pid
- current task의 pid namepsace에서 바라볼 때의 pid

init_task_pid() - 태스크의 pid구조체로 설정하기

이 함수에서는 alloc_pid()를 호출해서 할당받은 pid구조체를 연관된 태스크가 가리키도록 설정해준다.

kernel/fork.c

static inline void 
init_task_pid(struct task_struct *task, enum pid_type type, struct pid *pid)
{
         task->pids[type].pid = pid; 
}

자세한 설명 필요..

attach_pid() - pid 구조체를 사용하는 태스크리스트에 연결하기

자세한 설명 필요..

프로세스 종료시에 해제되는 pid

pidmap

pidhash

pid API

pid_nr()
- return global pid seen from init pid ns
pid_vnr()
- return pid seen from current pid ns
pid_nr_ns()
- return pid seen from pid ns specified

pid를 표현하는 자료구조는 pid 구조체이며 아래와 같이 정의되어 있다.

include/linux/pid.h

struct pid 
{
        atomic_t count;
        unsigned int level;
        /* lists of tasks that use this pid */
        struct hlist_head tasks[PIDTYPE_MAX];
        struct rcu_head rcu;
        struct upid numbers[1];
};
=

pid 구조체의 할당은 부팅시에 미리 만들어놓은 slab cache를 통해 이루어진다.

kernel/pid.c

void __init pidmap_init(void)
{
...
        init_pid_ns.pid_cachep = KMEM_CACHE(pid,
                        SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
}

코드에서도 보이듯이 init_pid_ns라는 부팅시 생성되는 기본 pid namespace에 연결된다. 이 말은 namspace단위로 pid의 slab cache가 관리된다는 것이다.

시스템의 첫 태스크인 init task의 per-process namespace는 init_nsproxy라는 젼역변수로 설정되어 있다.

#define INIT_TASK(tsk)                   
{                                        
...
        .nsproxy        = &init_nsproxy,
...

init_nsproxy는 nsproxy 구조체변수다. 이 구조체는 프로세스의 네임스페이스 관련 정보를 모두 표현하고 있다.

struct nsproxy init_nsproxy = { 
...
        .pid_ns_for_children    = &init_pid_ns,
...
};

그중 pid namespace는 부팅시에 만들어놓은 slab cache를 가리키던 init_pid_ns를 가리키고 있다. 새로운 태스크가 생성되면 태스크는 자신만의 PID를 할당받아야 한다. 이 작업은 alloc_pid()를 호출해서 수행된다.

PID(Process ID)