一个分析安居客最新房价的shell脚本

为了能够得到最新的房屋价格,虽然安居客这种网站的价格并不是很准,但是一旦有了历史数据,那分析起来还是可以得到点相关的信息的。分析的结果如下展示:

http://beijing.anjuke.com/viewprop-act-sell-id-34033853.html 13366569581 南北 美联物业 亚运新新家园朗月园 1450万元 5室3厅 365平米 39726元 共3层 2008
http://beijing.anjuke.com/viewprop-act-sell-id-34033627.html 13161862633 南北 润万嘉 远洋·万和城 760万元 3室1厅 163平米 46626元 8/12 2005 

第一列为房子的具体链接地址,第2列为销售的手机号,第3列为房子的朝向,第4列为中介公司,第5列为楼盘名字,第6列为总价,第七列为户型,第8列为楼层,第9列为房子年代

具体的shell脚本如下,得到的最后csv文件可以用excel或者python matlab进行画图。

#!/bin/bash
cd /home/timo/house

day=`date +%F-%H:%M`
today=`date +%F`
wget --header="Host: beijing.anjuke.com" --user-agent=Mozilla/5.0 http://203.166.162.82/v2/sale/W0QQdsmZmmQQmodeZ1QQs5Zall -O anjuke.txt

grep "房龄\|viewprop-act-sell-id" anjuke.txt | grep -v option > result.txt

sed '{N;s/\n/ /}' -i result.txt

if [ ! -f today.anjuke.$today.csv ] ;
then
	touch today.anjuke.$today.csv
fi

templink=`awk -F'"' '{print $2}' result.txt | tail -n 1`
tempid=`grep -c "$templink" today.anjuke.$today.csv`

if [ $tempid -le 0 ] ;  then
	echo "$day" >> today.anjuke.$today.csv
	awk '{print $2,$3,$11}' result.txt  | awk -F'"' '{print $2,$4,$5}' | sed 's#<em>\|</em>\|,\|单价:\|楼层:\|房龄:# #g' | sort -k 5 &gt;&gt; today.anjuke.$today.csv
fi

sed -e 's#二手房# #g' -i today.anjuke.$today.csv


for linenum in {1..25}
do
temp_phone=`awk '{print $2}' today.anjuke.$today.csv | tail -n 25 |  sed -n "$linenum"p| egrep "^[0-9]+" | wc -l`
if [ $temp_phone -le 0 ] ;
then
	for x in `awk '{print $1}' today.anjuke.$today.csv | tail -n 25 | sed -n "$linenum"p`
	do
		wget --user-agent=Mozilla/5.0 $x -O phone.txt
		phone=`egrep -e '"telphone"&gt;[0-9]+' phone.txt  | sed -e 's#<div class="telphone">\|</div>\|\t##g'`
		orien=`grep '朝向'  phone.txt  | grep -w li | sed -r 's#[a-z]||"|\/|=|\t|朝向:##g'`
		shop=`grep 'images.anjukestatic.com/broker' phone.txt  | awk -F'"' '{print $12}' | awk '{print $1}'`
		sed -e "s#$x#$x $phone $orien $shop#g" -i today.anjuke.$today.csv
	done
fi
done

计算所有房子总价的平均价格,不算不知道,一算吓我一跳阿

awk 'NF > 10' today.anjuke.2010-08-24.csv | awk '{print $6}' | awk -F'万元' '{print $1}' | awk 'BEGIN {sum=0} {sum+=$0} END{print sum/FNR}'
283.187

好了,其它的也就依样画葫芦计算拉。发现awk太强大了。

###########################################

Best regards
Timo Seven
blog: http://www.timoseven.com
twitter: http://twitter.com/zauc
Linux System Admin & MySQL DBA

关于 Timo
XNIX SA & MYSQL DBA

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: