Autumn 2014
Caches are hard to get right
@post('/add')
def add():
... get name and number from form
... add to database
@get('search/<name>')
def search(name):
number = mcClient.get(name)
if (number is None):
number = riakBucket.get(name)
mcClient.set(name, number)
return "Some html with the number"
Like this in groups 2, 3, 4, 6, 7, 9
Caches are hard to get right - let’s try to solve this - attempt 1
@post('/add')
def add():
... get name and number from form
... remove from memcache
... add to database
@get('search/<name>')
def search(name):
number = mcClient.get(name)
if (number is None):
number = riakBucket.get(name)
mcClient.set(name, number)
return "Some html with the number"
/add?name=A&number=0
Add and search simultaneously
/add?name=A&number=1 | /search?name=A | DB | MC |
---|---|---|---|
Get data from form | (A,0) | (A,0) | |
remove from memcache | (A,0) | (-,-) | |
number = mcClient.get(name) | (A,0) | (-,-) | |
if (number is None): | (A,0) | (-,-) | |
number = riakBucket.get(name) | (A,0) | (-,-) | |
mcClient.set(name, number) | (A,0) | (A,0) | |
add to database | (A,1) | (A,0) |
/search?name=A //returns 0
Caches are hard to get right - let’s try to solve this - attempt 2
@post('/add')
def add():
... get name and number from form
... add to database
... remove from memcache
@get('search/<name>')
def search(name):
number = mcClient.get(name)
if (number is None):
number = riakBucket.get(name)
mcClient.set(name, number)
return "Some html with the number"
/add?name=A&number=0
Add and search simultaneously
/add?name=A&number=1 | /search?name=A | DB | MC |
---|---|---|---|
number = mcClient.get(name) | (A,0) | (A,0) | |
if (number is None): | (A,0) | (-,-) | |
number = riakBucket.get(name) | (A,0) | (-,-) | |
#number == 0 | (A,0) | (-,-) | |
Get data from form | (A,0) | (-,-) | |
remove from memcache | (A,0) | (-,-) | |
add to database | (A,1) | (-,-) | |
mcClient.set(name, number) | (A,1) | (A,0) |
/search?name=A //returns 0
You could also do
@post('/add')
def add():
... get name and number from form
... add to database
... set on memcache = overwrite
Group 11, 12
/add?name=A&number=0
Add and search simultaneously
/add?name=A&number=0 | /search?name=A | DB | MC |
---|---|---|---|
number = mcClient.get(name) | (A,0) | (A,0) | |
if (number is None): | (A,0) | (-,-) | |
number = riakBucket.get(name) | (A,0) | (-,-) | |
#number == 0 | (A,0) | (-,-) | |
Get data from form | (A,0) | (-,-) | |
set on memcache | (A,0) | (A,1) | |
add to database | (A,1) | (A,1) | |
mcClient.set(name, number) | (A,1) | (A,0) |
/search?name=A //returns 0
Caches are hard to get right - let’s try to solve this - attempt by group 5
@post('/add')
def add():
... get name and number from form
... add to database
... remove from memcache
@get('search/<name>')
def search(name):
number = mcClient.get(name)
if (number is None):
number = riakBucket.get(name)
mcClient.cas(name, number) #only do a set if not updated
return "Some html with the number"
Problem : The teacher cannot find any guarantees on whether the CAS counter is updated on delete. If this is not the case, the following could happen: stale data in cache in the following (multi-threaded) case:
/add?name=A&number=1
Add and search simultaneously
/add?name=A&number=2 | /search?name=A | DB | MC |
---|---|---|---|
number = mcClient.get(name) | (A,1) | (-,-) | |
if (number is None): | (A,1) | (-,-) | |
number = riakBucket.get(name) | (A,1) | (-,-) | |
Get data from form | (A,1) | (-,-) | |
add to database | (A,2) | (-,-) | |
remove from memcache | (A,2) | (-,-) | |
mcClient.cas(name, number) #case1 | (A,2) | (A,1) | |
mcClient.cas(name, number) #case2 | (A,2) | (-,-) |
In search
if mc.get(name):
return mc[name]
Or
mc[name] = fetched.encoded_data
return mc[name]
It might be that the data in memcache is removed in between the two calls.
I would probably just stream the keys to the client using the provided method. I’m not sure what would be the better way to do it though this definitely isn’t a smart thing to do as the amount of keys can be quite high.
According to the Riak documentation using List keys is not feasible in production environment, because it causes the database to go through all the entries in the database. This is slow and requires too much resources to be practical. One way to implement a page that shows a list of files could be to use Riak’s secondary indexes. Secondary indexes are keywords that the database can use to narrow down the search. In file listing’s case one could use for example the file type as a secondary index key. Secondary key utilities also support range searches and for example pagination which would be handy if the amount of files is large. Naturally, one should use the streaming capabilities of Riak to get the keys.
The optimizations all seemed to be useful.
As seen in the table every optimization made sense, especially the cache and the change from CherryPy to Bootle improved the speed.
CherryPy | Bottle | |||||||
---|---|---|---|---|---|---|---|---|
HTTP | PBC | HTTP | PBC | |||||
Cache | No Cache | Cache | No Cache | Cache | No Cache | Cache | No Cache | |
Netem delay (1000ms, 5% pckg loss) | 2,9/sec | 0,59/sec | 4,7/sec | 0,99/sec | 1,2/sec | 0,2/sec | 2,0/sec | 0,33/sec |
No Netem delay | 81,7/sec | 59,0/sec | 81,9/sec | 72,5/sec | 83,0/sec | 53,7/sec | 83,7/sec | 57,0/sec |
The observed change depends on the order the optimizations are applied.
The cache only makes sense if the network to the database is slow.
scp
command could be helful for students who are not
very familiar with LinuxThere’s a possibility for SQL injection when giving the key and value values (add and search) but we trust that Riak is created in such a smart manner that they are dealt with internally (since we don’t input actual SQL).
The database is not published online and only able to touch on the localhost and with the other virtual machine. It’s not really practicle.