2014年12月17日星期三

Cassandra and Aerospike, the one million TPS war



Dec 2nd, 2011, Netflix first put one million writes on 288 EC2 m1 extra large nodes, triple-replicate. $0.078 per million writes.
The wars began!
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

July 19th, 2012, Netflix tested Cassandra with 12 SSD-based EC2 nodes, triple-replicate. $0.013 per million writes.
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

March 20th, 2014. 330 Cassandra nodes with Google's GCE. No hardware spec.
http://googlecloudplatform.blogspot.sg/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html

December 3rd, 2014. Aerospike made it with 50 nodes.  $0.01 per million writes.
http://www.aerospike.com/blog/1m-wps-6x-fewer-servers-than-cassandra

To be continued...

2014年8月5日星期二

MySQL in sub query performance pitfall

假如Account表有一个主键UUID, 那么下面这个SQL明显是应该很快返回的对吧

mysql> EXPLAIN select * from ACCOUNT WHERE UUID IN('43d92fe6-2379-4697-9ad9-74e14bb6e3e7');
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | ACCOUNT | const | PRIMARY       | PRIMARY | 110     | const |    1 |       |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------+

1 row in set (0.01 sec)

加入我要去join另外一个表ACCOUNT_LINK, UUID在这张表上也是有索引的, 那么也应该很快吧?

mysql> EXPLAIN select * from ACCOUNT INNER JOIN ACCOUNT_LINK ON ACCOUNT.UUID=ACCOUNT_LINK.UUID  WHERE  AUTH_DOMAIN='snda' and ACCOUNT_LINK.LOGINNAME='danieltest001';
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+---------------------------+------+-------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                       | rows | Extra       |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+---------------------------+------+-------------+
|  1 | SIMPLE      | ACCOUNT_LINK | ref    | PRIMARY,idx_loginname,IDX_ACC_LINK_DOMAIN_LOGINNAME | idx_loginname | 272     | const                     |    1 | Using where |
|  1 | SIMPLE      | ACCOUNT      | eq_ref | PRIMARY                                             | PRIMARY       | 110     | account.ACCOUNT_LINK.UUID |    1 |             |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+---------------------------+------+-------------+
2 rows in set (0.05 sec)

如果我只是把Join改成IN, 也应该很快吧? 因为sub query 只返回一条记录, 区分度非常高, 外面的查询正好又可以用索引. 但是, MySQL会先查外面的表, 太逆天了吧?

mysql> EXPLAIN select * from ACCOUNT WHERE UUID IN(select UUID from ACCOUNT_LINK WHERE  AUTH_DOMAIN='snda' and LOGINNAME='danieltest001');
+----+--------------------+--------------+-----------------+-----------------------------------------------------+---------+---------+------------+---------+-------------+
| id | select_type        | table        | type            | possible_keys                                       | key     | key_len | ref        | rows    | Extra       |
+----+--------------------+--------------+-----------------+-----------------------------------------------------+---------+---------+------------+---------+-------------+
|  1 | PRIMARY            | ACCOUNT      | ALL             | NULL                                                | NULL    | NULL    | NULL       | 8493047 | Using where |
|  2 | DEPENDENT SUBQUERY | ACCOUNT_LINK | unique_subquery | PRIMARY,idx_loginname,IDX_ACC_LINK_DOMAIN_LOGINNAME | PRIMARY | 292     | func,const |       1 | Using where |
+----+--------------------+--------------+-----------------+-----------------------------------------------------+---------+---------+------------+---------+-------------+
2 rows in set (0.00 sec)