utorak, 27. ožujka 2012.

MySQL Stored Procedure - application example

USER_ACTIVITY

user_id nightmorningdayeveninglast_update
1 4.62.43.68.62012-03-26 16:22:42
214.29.98.210.42012-03-26 07:45:22
31.26.36.513.12012-03-25 22:12:55
43.12.65.22.92012-03-24 11:34:12
...

distribution of hours:
night = 21h-5h
morning = 5h - 10h
day = 10h - 16h
evening = 16h-21h


-------------------------------------------------------------------------------------------------

Say you need to make record of users' activity. Each time you check upon a user and see that he was active since you last checked him, you add 1 point to his activity.

Example:
Current time is: 2012-03-26 20:42:38
We use some other service (not important which one) to find out whether user was active since we last checked him or not.
For simplicity, we assume that all users had some activity since the last time they were checked.

I check upon user#1.
I need to distribute 1 point of activity across the times of day.
Since it was the same evening since we last updated this user, we add 1 point to 'evening activity'.

Then i check upon user#2.
Since his last update 12.95 hours passed. 2.25 hours of that time was in the morning, 6 hours were during the day and 4.7 hours was during the evening. In the same ratio, 1 point should be distributed to the day parts (morning, day, evening). This means: 0.174 to 'morning', 0.463 to 'day' and 0.363 to 'evening'.

In the case of user#4 we would like to do something like this:
Since last update of this user 57,133 hours passed. We create variable hour_inc = 1/57.133.
Then we add (hour_inc*8) to 'night', (hour_inc*5) to 'morning', (hour_inc*6) to 'day' and (hour_inc*5) to 'evening'.
Sumation of these increments in this case is less than 1.

So if we want to do this for arbitrary 100 users from this table and we want to update existing information based on the information we just calculated, how would we do it?
Is it possible to do it with one with one SQL statement?
I don't think so... if you know let me know...

So the first thing we do. We open our programming language of choice and make a function that will do this for us:
{
    1) do a SELECT query to fetch needed data
    2) iterate through users and calculate new activities for each one
    3) after that, we do a 100 of consecutive update statements. Each update statement is different because 'last_update' time for each user is different
}

The other approach is using stored procedures.
We make a procedure that uses a cursor and iterates through all users.
For each of them builds up a big UPDATE statement (string) using concatenation in combination with some form of loop.
After a statement is built, we use PREPARE-EXECUTE to preform a statement which was up to now in a some variable of type VARCHAR(1500).

The thing that I observed is that using this approach 20000 users gets correctly updated in up to 7 seconds.
Then I modified the stored procedure so that it doesn't apply those update statements but instead prints them out using SELECT command.
I copied one of update statements, made a complement update statement that annuls first one's effect.
Then I copied this pair many times until I had 500 update queries. I executed them and waited 34 seconds before it finished.
That is the duration of merely updating information in database, not including the time it took to calculate this data that is being put in database. While the stored procedure does the calculation + update within 7 seconds for 20000 users.

If I did everything right, THAT is pretty impressive.

Nema komentara:

Objavi komentar