Concept

  • OSD: the program responsible for operating the hard disk, one hard disk one OSD
  • MON: manage cluster status, more important, can run one on each of multiple nodes
  • MGR: monitoring cluster status
  • RGW(optional): provides object storage API
  • MDS(optional): provides CephFS

Ways to use Ceph for storage.

  1. librados: library
  2. radosgw: Object Storage HTTP API
  3. rbd: block storage
  4. cephfs: file system

Authentication

Ceph client authentication requires a username + key. By default, the username is client.admin and the key path is /etc/ceph/ceph.username.keyring. ceph --user abc indicates that the cluster is accessed as user client.abc.

The user’s permissions are determined by service type. You can use ceph auth ls to show all users and their permissions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ ceph auth ls
osd.0
        key: REDACTED
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: REDACTED
        caps: [mds] allow *
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *

As you can see, osd.0 has all permissions for OSD, and only permissions for osd-related functions for both mgr and mon; client.admin has all permissions. profile can be thought of as a collection of predefined permissions.

Create a new user and grant permissions.

1
ceph auth get-or-create client.abc mon 'allow r'

Modify permission.

1
ceph auth caps client.abc mon 'allow rw'

Get permission.

1
ceph auth get client.abc

Delete User.

1
ceph auth print-key client.abc

OSD

Managing OSDs is actually managing the hard drives that store your data.

Check the status.

1
ceph osd stat

Shows how many online and offline OSDs there are.

1
ceph osd tree

Shows the storage tiers, where the non-negative IDs are the actual OSDs and the negative numbers are other tiers, such as storage pools, enclosures, hosts, etc.

Pool

Pool is a storage pool, subsequent RBD/CephFS features need to specify a storage pool to work.

Create a storage pool.

1
2
ceph osd pool create xxx
ceph osd pool create PG_NUM

For performance reasons, you can set the number of PGs (Placement Groups). By default, a replicated type of pool will be created, which means that multiple copies will be stored, similar to RAID 1. It can also be set to an erasure type of pool, similar to RAID 5.

The data in each Placement Group will be stored in the same OSD. The data is distributed in different PGs by hash.

List all storage pools.

1
ceph osd lspools

View storage pool usage.

1
rados df

IO state of the storage pool.

1
ceph osd pool stats

Take a snapshot of the storage pool.

1
ceph osd mksnap xxx snap-xxx-123

RBD

RBD exposes Ceph as a block device.

Create

Initialize Pool for RBD.

1
rbd pool init xxx

For security reasons, a separate user is usually created for the RBD user.

1
ceph auth get-or-create client.abc mon 'profile rbd' osd 'profile rbd pool=xxx' mgr 'profile rbd pool=xxx'

Create an RBD image.

1
rbd create --size 1024 xxx/yyy

Indicates that an image with the name yyy and a size of 1024MB was created on Pool xxx.

Status

Lists the mirrors in the Pool.

1
2
rbd ls
rbd ls xxx

The default Pool name is rbd.

View mirror information.

1
2
rbd info yyy
rbd info xxx/yyy

Expand capacity

Modify the capacity of the mirror.

1
2
rbd resize --size 2048 yyy
rbd resize --size 512 yyy --allow-shrink

Mount

When mounting RBD on another machine, first modify the configuration under /etc/ceph to make sure you have the user, key and MON address.

Then, mount the device with rbd.

1
rbd device map xxx/yyy --id abc

Mount the yyy image under Pool xxx as user abc.

You can see the device files under /dev/rbd* or /dev/rbd/ at this point.

The mounted devices are displayed.

1
rbd device list

CephFS

Create

If the orchestrator is configured, you can directly use the following command.

1
ceph fs volume create xxx

Create a CephFS named xxx.

It can also be created manually.

1
2
3
ceph osd pool create xxx_data0
ceph osd pool create xxx_metadata
ceph fs new xxx xxx_metadata xxx_data0

This creates two pools, one for storing metadata and one for storing file data. A CephFS requires one pool for metadata and several pools for file data.

Once CephFS is created, the corresponding MDS is started.

Status

View the MDS status.

1
ceph mds stat

Client Configuration

Before mounting CephFS, first configure the client.

Run ceph config generate-minimal-conf in the cluster and it will generate a configuration file.

1
2
3
4
5
$ ceph config generate-minimal-conf
# minimal ceph.conf for <fsid>
[global]
        fsid = <fsid>
        mon_host = [v2:x.x.x.x:3300/0,v1:x.x.x.x:6789/0]

Copy the contents to /etc/ceph/ceph.conf on the client. This will allow the client to find the MON address and FSID of the cluster.

Next, we create a user on the cluster for the client.

1
ceph fs authorize xxx client.abc / rw

Create a user, abc, with read and write access to CephFS xxx. Save the output to /etc/ceph/ceph.client.abc.keyring on the client.

Mount

1
2
3
4
5
6
7
8
9
mount -t ceph abc@.xxx=/ MOUNTPOINT
# or
mount -t ceph abc@<fsid>.xxx=/ MOUNTPOINT
# or
mount -t ceph abc@<fsid>.xxx=/ -o mon_addr=x.x.x.x:6789,secret=REDACTED MOUNTPOINT
#or
mount -t ceph abc@.xxx=/ -o mon_addr=x.x.x.x:6789/y.y.y.y:6789,secretfile=/etc/ceph/xxx.secret MOUNTPOINT
# or
mount -t ceph -o name=client.abc,secret=REDACTED,mds_namespace=xxx MON_IP:/ MOUNTPOINT

Log in as user client.abc and mount the / directory under CepFS xxx to MOUNTPOINT. It will read the configuration under /etc/ceph, and if it is already written in ceph.conf, it can be left out on the command line.

fsid refers not to the CephFS ID, but actually to the cluster ID: ceph fsid.

Quotas

CephFS can place limits on directories.

1
2
3
4
setfattr -n ceph.quota.max_bytes -v LIMIT PATH
setfattr -n ceph.quota.max_files -v LIMIT PATH
getfattr -n ceph.quota.max_bytes PATH
getfattr -n ceph.quota.max_files PATH

Limits the directory size and number of files; a LIMIT of 0 means no limit.

NFS

You can share out CephFS or RGW by way of NFS.

Start the NFS service.

1
2
ceph nfs cluster create xxx
ceph nfs cluster create xxx "host1,host2"

Run an NFS server on the host, and the name of the NFS cluster is xxx.

View NFS cluster information.

1
ceph nfs cluster info xxx

List all NFS clusters.

1
ceph nfs cluster ls

NFS Export CephFS.

1
ceph nfs export create cephfs --cluster-id xxx --pseudo-path /a/b/c --fsname some-cephfs-name [--path=/d/e/f] [--client_addr y.y.y.y]

This exports a directory within CephFS that can be accessed by clients via the NFS mount /a/b/c path (pseudo path). You can set access rights to the client’s IP.

This allows you to mount on the client side.

1
mount -t nfs x.x.x.x:/a/b/c /mnt

RadosGW

RGW provides S3 or OpenStack Swift-compatible object storage APIs.

TODO

orchestrator

Since Ceph needs to run multiple daemons, all in different containers, a system-level orchestrator is typically run to add and manage these containers.

View the current orchestrator.

1
2
3
4
$ ceph orch status
Backend: cephadm
Available: Yes
Paused: No

The more common one is cephadm. If cephadm is used during installation, then the orchestrator is also it.

The service being orchestrated.

1
ceph orch ls

The container being orchestrated.

1
ceph orch ps

The orchestrated host.

1
ceph orch host ls

Update

Use the container orchestrator to upgrade.

1
2
ceph orch upgrade start --ceph-version x.x.x
ceph orch upgrade start --image quay.io/ceph/ceph:vx.x.x

If you can’t find the image on docker hub, pull it from quay.io.

Check the status of the upgrade.

1
2
ceph orch upgrade status
ceph -s

View cephadm logs.

1
ceph -W cephadm

Reference Documentation