Thursday, May 28, 2009

Resuming transmission...

I've been busy lately: preparing papers and presentations for the upcoming ODTUG, presenting at Hotsos-revisited, and presenting at the Dutch DBA-symposium. All spare time went into these activities, and the Helsinki blog just had to wait in line. But now I intend to resume transmission over here again.

A couple of weeks ago I received following comment on this blog:
toon,

Suppose we have a registration form on our web site, and the web guys want to check if the username supplied in the form is already in use.

This would be a stored procedure in my UI-api, right ? (one in parameter and a return value to represent true or false)

The quick answer was, of course, yes. But then I started thinking about how I would implement this. We would like to have a robust solution, and for this there are a few things you need to be aware of.
  1. There is some time in between the ‘in-use check’ and the final ‘user registration is executed’ moment. Other users could be checking for the same username during this time.
  2. We are dealing with (stateless) browser users. They can just stop their “user registration” process by closing the browser or moving on to another url.
  3. You need to cater for concurrency.
After some careful thinking here is the outline of a “start” solution as I would have designed it.

Obviously there would be some users table that holds already registered (i.e. in-use) usernames.
create table web_user
(userid number not null
,username varchar2(30) not null check(username = upper(username))
,...more columns...
,primary key (userid)
,unique(username)
);
In addition to that I would introduce a table to hold “in-use check” calls.
create table web_user_inuse_check
(sessionid varchar2(...) not null
,username varchar2(30) not null check(username = upper(username))
,checktime date not null
,primary key (sessionid)
,unique (username)
);
The meaning of this table would be: “username [username] was found to be not in use when checked by session [sessionid] at [checktime].” The sessionid would likely map to the value of the http sesssion cookie that is maintained on the middle tier for every connected browser.

This design would have an additional constraint as follows (specified as an SQL assertion):
create assertion no_overlap as
check(not exists (select username from web_user
intersect
select username from web_user_inuse_check));
Which would be implemented separately using triggers on both tables (not shown in this post).

Next, two procedures would be involved: one for the “inuse” check, and one for the actual creation of a “not-yet-used” username. (in pseudo code)
procedure p_check_already_used
(p_username in varchar2
,p_sessionid in varchar2
,p_inuse out varchar2) as
begin
--
-- Serialize calls to this procedure.
--
Appropriate dbms_lock call to serialize *per username*.
Will raise exception if lock could not be acquired.
--
-- Possible scenarios:
-- 1) username already in use.
-- 2) username not in use and also no inuse_check available.
-- 3) username not in use and inuse_check available by other session.
-- 3a) inuse_check is more than 15 minutes ago.
-- 3b) inuse_check is 15 minutes or less ago.
-- 4) username not in use and inuse_check available by same session.
--
If [scenario 1]
Then p_inuse := ‘YES’
Elsif [scenario 2]
Then create (using merge on sessionid) inuse_check row for this session;
P_inuse := ‘NO’;
Elsif [scenario 3a]
Then delete inuse_check row of other session;
Create (merge) inuse_check row for this session;
P_inuse := ‘NO’;
Elsif [scenario 3b]
Then p_inuse := ‘YES’;
Elsif [scenario 4]
Then P_inuse := ‘NO’;
End if;
--
-- Commit transaction (also releases dbms_lock lock).
--
Commit;
--
Exception when “lock could not be acquired”
--
P_inuse := ‘YES’;
--
end;
And finally the user-creation procedure.
procedure p_create_user
(p_sessionid in varchar2
,p_username in varchar2
,other user data) as
Begin
--
If “Our (sessionid!) inuse_check row has disappeared”
Then -- It took us more than 15 minutes to fill in the form, and
-- some other session has claimed this username now.
--
-- Quickly check if we can claim it again.
--
If p_check_already_used = ‘YES’
Then error
End if;
End if;
--
-- Now register our user.
--
Delete our inuse_check row;
Insert web_user row;
Commit;
--
End;
I know this may seem an overly complex solution. And I would probably check with the customer first whether dealing with the “concurrency scenarios” is really necessary or not.

Final remark: I would put both procedures inside a package (BL-code), and create two stub procedures on top of the package (UI API’s) which would be granted (execute) to the schema used by the yafet to connect to the DBMS.

2 comments:

  1. Toon,

    You use dbms_lock (here and in your book) as a serialisation mechanism.

    Typically,I see SELECT FOR UPDATE used to serialise access to table/s. Do you use this ? Is this because it usually implies an embedded procedural approach rather than triggered procedural ?

    ReplyDelete
  2. Peter,

    I use dbms_lock in two scenarios:

    1) To serialize wBL (transactional business logic) that I know would likely fail were it not serialized in some way.
    2) To serialize DL (Data Logic that validates constraints).

    I do not use 'select for update' in these scenarios, because:
    3) It is often too strong, i.e. it serializes more transactions than is necessary.
    4) It is sometimes just not possible to serialize with 'select for update' as is required. Typically because the data is just not there to 'select for update.'

    And of course I do use 'select for update.' In BL, when required. For instance to make sure I can delete or update existing data, by first 'selecting for update' those rows.
    Or when too-strong serialization (reason 3) is acceptable.

    In the example I gave in this post reason 4 applies.

    When designing/coding procedure p_check_already_used, I usually at the same time start thinking about what might happen (i.e. go wrong) if two sessions invoke the procedure at near same time. And in this case, for a same username value. Things might indeed go wrong: the procedure could tell both callers 'yes, this username is still available.' Now suppose we have scenario 2 at hand (username not in use and also no inuse_check available). In this case there is no web_user nor web_user_in_use_check row for this username. So I cannot 'select for update' any data to serialize two calls for the same username. Dbms_lock is my only option then.
    In the other scenarios (1, 3ab, 4), similar arguments can be given.

    Let me also give you a highlevel explanation of why I use it in my book, which concerns (2) DL code serialization.

    Suppose we have a parent and a child table. And we want to implement the constraint constraint "no more than ten child rows per parent row." Current state is: one parent row that already has nine child rows.
    Now come along two transactions TX1 and TX2. Both insert a new child row under the parent row. The constraint is implemented by (among others) a after insert trigger on the child row. The trigger executes a query that counts the number of child-rows for the current parent and raises an error if more than ten are found.

    Obviously this needs to be serialized. If TX1 and TX2 happen at the (near) same time, the trigger will in both sessions count only ten rows (the query inside the trigger does not see the not yet committed insert of the other concurrent transaction). And allow both TX1 and TX2. However after both have comitted we end up with eleven child rows for the parent.

    Now we could fix this by first executing a 'select for update' on the parent row inside the trigger, before it counts the number of child rows. But this would just be too strong. For it would also prevent any third transaction, say TX3, that executes at the same time and wants to update any column value of that parent row.

    By instead of 'selecting for update the parent' row, using dbms_lock to acquire an application-lock, whose name would involve the PK-value of the parent-row, in X-mode, we prevent the too-strong serialization.

    ReplyDelete

Note: Only a member of this blog may post a comment.