mysql 大数据查询性能加速研究

对于50万级别的数据, 应该有必要研究一下如何增加mysql的性能了, 当然了mysql处理50万数据的表就算按默认设置也是ok的, 不过肉眼能感受到速度的下降, 但在接受范围内. 还是研究下来对付百万级别的数据量:

1. 开启默认的mysql缓存

大多数情况下这个好像是开启的, 我们来确认一下:

[cce]
#mysql localhost -u root -p
#enter passwd
mysql>
mysql>use database1
[/cce]
[cce]
mysql> show VARIABLES LIKE '%cache%';

+------------------------------+----------------------+
| Variable_name                | Value                |
+------------------------------+----------------------+
| binlog_cache_size            | 32768                |
| have_query_cache             | YES                  |
| key_cache_age_threshold      | 300                  |
| key_cache_block_size         | 1024                 |
| key_cache_division_limit     | 100                  |
| max_binlog_cache_size        | 18446744073709547520 |
| query_cache_limit            | 1048576              |
| query_cache_min_res_unit     | 4096                 |
| query_cache_size             | 16777216             |
| query_cache_type             | ON                   |
| query_cache_wlock_invalidate | OFF                  |
| table_definition_cache       | 256                  |
| table_open_cache             | 64                   |
| thread_cache_size            | 8                    |
+------------------------------+----------------------+
[/cce]
[cce]
mysql> show status LIKE '%Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 1        |
| Qcache_free_memory      | 32759656 |
| Qcache_hits             | 48678002 |
| Qcache_inserts          | 11871037 |
| Qcache_lowmem_prunes    | 7482829  |
| Qcache_not_cached       | 178368   |
| Qcache_queries_in_cache | 0        |
| Qcache_total_blocks     | 1        |
+-------------------------+----------+

mysql> SET GLOBAL query_cache_size = 32777216
[/cce]
query_cache_size: 是缓存占用内存的大小
query_cache_type:ON 表示启用了缓存

 

2. 由”备份-删除-恢复”机制带来的性能加速

注: 本方法对于”不活跃”的”海量数据”具有高效的访问性能

用户案例: 100万数据, 两周内活跃的仅为10万数据, 而且其余90%两周后再次活跃的概率也极低.

显而易见, 该方法适用于 海量数据存储, 但轻量级访问频率的数据库

 

  • 编写shell脚本 自动备份数据,然后删除已备份数据:
[cce]
#!/bin/bash
# author: Wang
# date: 2013-02-12
# find all the user needed to be backup

# define the Constants
Filename="/pathTo/BackupList.txt"
#Filename="/pathTo/BackupList.txt"

MediaRoot="/pathTo/media"
User="user"
Passwd="passwd"
Host="localhost"
DB="DBname"

# the current time
ctime="$(date +"%Y-%m-%d %H:%M:%S")"

# the backup time for Linux
#btime="$(date -d '-1 weeks' +%Y-%m-%d\ %H:%M:%S)"
# the backup time for Mac
# btime="$(date -v-1w +%Y-%m-%d\ %H:%M:%S)"
btime="$(date -v-2m +%Y-%m-%d\ %H:%M:%S)"
#btime="$(date -d '-2 weeks' +%Y-%m-%d\ %H:%M:%S)"

# the last backup time
# ltime="$(date -v-4m +%Y-%m-%d\ %H:%M:%S)"
# ltime="$(date -d '-4 months' +%Y-%m-%d\ %H:%M:%S)"

# get the list of backup ids
mysql -u$User -p$Passwd -h$Host -D $DB -e "select id from users where active=True and last_login < \"$btime\"" > $Filename

# for every id, output all the sql and delete it.
for id in `cat $Filename | grep -v id`
	do
	#echo $id
	if [ -e "$MediaRoot/$id" ];  then
		echo "dump for profile: "$id
		else
		echo "mkdir for profile: "$id
		mkdir "$MediaRoot/$id"
	fi

	if [ -e "$MediaRoot/$id/$id.sql" ];  then
		echo "$id.sql already exists"
	else
		mysqldump -u$User -p$Passwd -h$Host $DB paws --where "source_id in (select id from datasources where owner_id=$id)" --no-create-info --lock-all-tables > $MediaRoot/$id/$id.sql
		mysql -u$User -p$Passwd -h$Host -D $DB -e "DELETE from paws where source_id in (select id from datasources where owner_id=$id)" >> /tmp/log.txt
		mysql -u$User -p$Passwd -h$Host -D $DB -e "UPDATE users SET active=False where id=$id" >> /tmp/log.txt
	fi
done
[/cce]

 

  • 当用户2周后再次登录时, 使用python在其登录时调用恢复脚本
[cce]
if p.active == False:
        file_time = datetime.now().strftime("%Y-%m-%d_%H_%M_%S")
        restore_cmd = shlex.split("mysql -uuser -ppasswd -hlocalhost -D DBname -e 'SOURCE "+MediaRoot+"/"+str(p.id)+"/"+str(p.id)+".sql'")
        backup_cmd = shlex.split("mv "+MediaRoot+"/"+str(p.id)+"/"+str(p.id)+".sql "+MediaRoot+"/"+str(p.id)+"/"+str(p.id)+"_"+file_time+".sql")
        subprocess.Popen(restore_cmd).wait()
        subprocess.Popen(backup_cmd).wait()
        p.active = True
    p.last_login = now_time
    p.save()
[/cce]

	

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注