使用glusterfs来实现文件服务器高可用

由于项目需要,需要对文件服务器实现高可用。之前想过一部份方案是nfs+rsync+inotify+keepalived 这样的方式。但是有很多问题,很多服务器mount之后一旦主的nfs挂掉之后,需要在client服务器上重新进行mount这样的操作。

于是考虑到使用分布式文件系统,这个网上有很多文章来进行比较的。我用这个主要是它的部署方式灵活,没有单点问题,可以mount(通过fuse)

下面就是具体的部署过程,我的操作系统是centos5.6 x86_64

client的IP为:192.168.0.201
server的IP为: 192.168.0.202,192.168.0.203

server端共享的文件夹为 /home/filecluster
client端的文件夹为 /home/filecluster

修改3个机器的hosts文件

192.168.0.201	xen1
192.168.0.202	xen2
192.168.0.203	xen3

首先是需要下载fuse和glusterfs,以及python-ctypes

wget http://download.gluster.com/pub/gluster/glusterfs/3.2/LATEST/glusterfs-3.2.2.tar.gz
wget http://downloads.sourceforge.net/project/fuse/fuse-2.X/2.8.5/fuse-2.8.5.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Ffuse%2Ffiles%2Ffuse-2.X%2F2.8.5%2F&ts=1313661051&use_mirror=cdnetworks-kr-2
wget http://download.fedora.redhat.com/pub/epel/5/x86_64/python-ctypes-1.0.2-2.el5.x86_64.rpm

安装python-ctypes和fuse以及glusterfs

rpm -ivh python-ctypes-1.0.2-2.el5.x86_64.rpm
tar zxvf fuse-2.8.5.tar.gz && cd fuse* && ./configure && make && make install
tar zxvf glusterfs-3.2.2.tar.gz && cd glusterfs* && ./configure --enable-fusermount && make && make install

安装完成后会自动生成/etc/init.d/glusterd

将fuse模块放入到开机自动启动中

echo "modprobe fuse" > /etc/sysconfig/modules/fuse.modules
chmod 755 /etc/sysconfig/modules/fuse.modules
modprobe fuse

把glusterd加入到开机启动项中

chkconfig glusterd on

修改/etc/init.d/glusterd 文件

server端

#!/bin/bash
#
# chkconfig: 35 90 12
# description: Gluster File System service for volume management
#

# Get function from functions library
. /etc/rc.d/init.d/functions

BASE=glusterd
GLUSTERFSD=glusterfsd
GLUSTERFS=glusterfs
GLUSTERD_BIN=/usr/local/sbin/$BASE
GLUSTERD_OPTS="-l /var/log/glusterfs.log -f /usr/local/etc/glusterfs/glusterfsd.vol"
GLUSTERD="$GLUSTERD_BIN $GLUSTERD_OPTS"
RETVAL=0

# Start the service $BASE
start()
{
       echo -n $"Starting $BASE:"
       daemon $GLUSTERD
       RETVAL=$?
       echo
       [ $RETVAL -ne 0 ] && exit $RETVAL
}

# Stop the service $BASE
stop()
{
       echo -n $"Stopping $BASE:"
       killproc $BASE
       echo
       pidof -c -o %PPID -x $GLUSTERFSD &> /dev/null
       [ $? -eq 0 ] &&  killproc $GLUSTERFSD &> /dev/null

       #pidof -c -o %PPID -x $GLUSTERFS &> /dev/null
       #[ $? -eq 0 ] &&  killproc $GLUSTERFS &> /dev/null

       if [ -f /etc/glusterd/nfs/run/nfs.pid ] ;then
       pid=`cat /etc/glusterd/nfs/run/nfs.pid`;
       cmd=`ps -p $pid -o comm=`

       if [ $cmd == "glusterfs" ]; then
       kill `cat /etc/glusterd/nfs/run/nfs.pid`
       fi
       fi
}

### service arguments ###
case $1 in
 start)
       start
       ;;
 stop)
       stop
       ;;
 status)
       status $BASE
       ;;
 restart)
       $0 stop
       $0 start
       ;;
 *)
       echo $"Usage: $0 {start|stop|status|restart}."
       exit 1
esac

exit 0

client端

#!/bin/bash
#
# chkconfig: 35 90 12
# description: Gluster File System service for volume management
#

# Get function from functions library
. /etc/rc.d/init.d/functions

BASE=glusterd
GLUSTERFSD=glusterfsd
GLUSTERFS=glusterfs
GLUSTERD_BIN=/usr/local/sbin/$GLUSTERFS
GLUSTERD_OPTS="-l /var/log/glusterfs.log -f /usr/local/etc/glusterfs/glusterfs.vol /home/filecluster"
GLUSTERD="$GLUSTERD_BIN $GLUSTERD_OPTS"
RETVAL=0

# Start the service $BASE
start()
{
       echo -n $"Starting $GLUSTERFS:"
       daemon $GLUSTERD
       RETVAL=$?
       echo
       [ $RETVAL -ne 0 ] && exit $RETVAL
}

# Stop the service $BASE
stop()
{
       echo -n $"Stopping $GLUSTERFS:"
       killproc $GLUSTERFS
       echo
       pidof -c -o %PPID -x $GLUSTERFSD &> /dev/null
       [ $? -eq 0 ] &&  killproc $GLUSTERFSD &> /dev/null

       #pidof -c -o %PPID -x $GLUSTERFS &> /dev/null
       #[ $? -eq 0 ] &&  killproc $GLUSTERFS &> /dev/null

       if [ -f /etc/glusterd/nfs/run/nfs.pid ] ;then
       pid=`cat /etc/glusterd/nfs/run/nfs.pid`;
       cmd=`ps -p $pid -o comm=`

       if [ $cmd == "glusterfs" ]; then
       kill `cat /etc/glusterd/nfs/run/nfs.pid`
       fi
       fi
}

### service arguments ###
case $1 in
 start)
       start
       ;;
 stop)
       stop
       ;;
 status)
       status $BASE
       ;;
 restart)
       $0 stop
       $0 start
       ;;
 *)
       echo $"Usage: $0 {start|stop|status|restart}."
       exit 1
esac

exit 0

修改server端的/usr/local/etc/glusterfs/glusterfsd.vol, 这个配置就是在option bind-address 部分2台server会有所不同,其它全部一致

### file: server-volume.vol.sample

#####################################
###  GlusterFS Server Volume File  ##
#####################################

#### CONFIG FILE RULES:
### "#" is comment character.
### - Config file is case sensitive
### - Options within a volume block can be in any order.
### - Spaces or tabs are used as delimitter within a line.
### - Multiple values to options will be : delimitted.
### - Each option should end within a line.
### - Missing or commented fields will assume default values.
### - Blank/commented lines are allowed.
### - Sub-volumes should already be defined above before referring.

### Export volume "brick" with the contents of "/home/export" directory.
volume brick
  type storage/posix                   # POSIX FS translator
  option directory /home/filecluster        # Export this directory
end-volume
volume locker
  type features/posix-locks
  subvolumes brick
end-volume

### Add network serving capability to above brick.
volume server
  type protocol/server
  option transport-type tcp/server
# option transport-type unix
# option transport-type ib-sdp
 option bind-address 192.168.0.202     # Default is to listen on all interfaces # xen3上修改成192.168.0.203
# option listen-port 9999

# option transport-type ib-verbs
# option transport.ib-verbs.bind-address 192.168.1.10     # Default is to listen on all interfaces
# option transport.ib-verbs.listen-port 24016
# option transport.ib-verbs.work-request-send-size  131072
# option transport.ib-verbs.work-request-send-count 64
# option transport.ib-verbs.work-request-recv-size  131072
# option transport.ib-verbs.work-request-recv-count 64

# option client-volume-filename /etc/glusterfs/glusterfs-client.vol
  subvolumes brick
# NOTE: Access to any volume through protocol/server is denied by
# default. You need to explicitly grant access through # "auth"
# option.
  option auth.addr.brick.allow 192.168.0.* # Allow access to "brick" volume
  option auth.addr.locker.allow 192.168.0.* # Allow access to "locker" volume
end-volume

修改client端的/usr/local/etc/glusterfs/glusterfs.vol文件

### Add client feature and attach to remote subvolume
volume xen2
  type protocol/client
  option transport-type tcp/client
  option remote-host xen2
  option remote-port 24007
  option remote-subvolume locker       #name of the remote volume
end-volume

volume xen3
  type protocol/client
  option transport-type tcp/client
  option remote-host xen3
  option remote-port 24007
  option remote-subvolume locker
end-volume

#volume replicate2
#  type cluster/replicate
#  subvolumes xen2
#end-volume
#
#volume replicate3
#  type cluster/replicate
#  subvolumes xen3
#end-volume
#
volume bricks
  type cluster/replicate
  subvolumes xen2 xen3
#  subvolumes replicate1
end-volume
#
#volume writebehind
#  type performance/write-behind
#  option cache-size 1MB
#  subvolumes distribute
#end-volume
#
#volume cache
#  type performance/io-cache
#  option cache-size 64MB
#  subvolumes writebehind
#end-volume

最后就是启动server端的glusterd程序

/etc/init.d/glusterd start

然后启动client端的glusterd程序

/etc/init.d/glusterd start

这样你在client端用df就能看到如下这样的显示

[root@xen1 filecluster]# df -h
文件系统              容量  已用 可用 已用% 挂载点
/dev/sda1              29G  3.5G   24G  13% /
tmpfs                 512M     0  512M   0% /dev/shm
glusterfs#/usr/local/etc/glusterfs/glusterfs.vol
                       29G  3.3G   25G  12% /home/filecluster

然后我做了一个简单的跟NFS的对比测试

   1. NFSv4
         1. dd if=/dev/zero of=xen.img bs=1M count=500
            524288000 bytes (524 MB) copied, 13.9683 seconds, 37.5 MB/s
         2. dd if=/dev/zero of=xen.img bs=1M count=32
            33554432 bytes (34 MB) copied, 0.710816 seconds, 47.2 MB/s
   2. gluster
         1. dd if=/dev/zero of=xen.img bs=1M count=500
            524288000 bytes (524 MB) copied, 18.4192 seconds, 28.5 MB/s
         2. dd if=/dev/zero of=xen.img bs=1M count=32
            33554432 bytes (34 MB) copied, 0.591001 seconds, 56.8 MB/s

当然server端可以有2个以上的服务器来充当,但是由于我这里使用的是replication,所以用再多的服务器是比较浪费的,因为replication的模式下所有服务器的容量大小都是相同的。

gluster共提供以下几种模式

1. Distributed Volumes
2. Replicated Volumes
3. Striped Volumes
4. Distributed Striped Volumes
5. Distributed Replicated Volumes

一个系统上线必须要做一些故障测试。这里主要是对于服务器端的故障测试,因为到时候所有数据都是存储在服务器端的。我们就模拟在client写入的时候服务器端挂掉的情况
先是在客户端上执行一个大文件写入,这个时候我们关闭xen2上的glusterd程序

dd if=/dev/zero of=xen1.img bs=1M count=500

当客户端写入完成后我们看3台服务器上的xen1.img文件的大小

[root@xen1 filecluster]# ll
总计 512508
-rw-r--r-- 1 root root 524288000 08-24 15:33 xen1.img

[root@xen2 filecluster]# ll
总计 241652
-rw-r--r-- 1 root root 247201792 08-24 15:32 xen1.img

[root@xen3 filecluster]# ll
总计 512508
-rw-r--r-- 1 root root 524288000 08-24 15:33 xen1.img

我们可以看到xen2上的数据大小是不对的。然后我们启动xen2上的glusterd程序

当启动完成后我们看到的结果还是跟前面一样的。但是我们可以在client端执行

[root@xen1 filecluster]# ll

这样xen2上数据大小也正确了

但是我们不知道在有目录的情况这个方法是否还是可以。在client端执行

mkdir file
cd file
dd if=/dev/zero of=xen1.img bs=1M count=500

其它条件还是跟之前一样,发现如果是在/home/filecluster目录下执行ls的话,xen2上数据还是不正确的。
所以我们一旦发现某个server挂了之后必须执行如下命令才能完全同步

[root@xen1 filecluster]# find ./ -name "*" | xargs ls

###########################################

Best regards
Timo Seven
blog:http://www.timoseven.com
twitter: http://twitter.com/zauc
Linux System Admin & MySQL DBA

关于 Timo
XNIX SA & MYSQL DBA

5 Responses to 使用glusterfs来实现文件服务器高可用

  1. muxueqz说道:

    呵呵,修复了?
    对了,为什么不考虑drbd?

    • Timo说道:

      可以很好的修复,因为glusterfs对于故障有很好的容错性能,在设计之初就考虑了很多容错。 而且DRBD功能有限,glusterfs可以很容易的进行扩展,也可以使用其它模式

  2. muxueqz说道:

    呵呵,个人比较看好sheepdog这类key-value类针对虚拟化的存储。

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: