Well?
Well… What were you expecting?
Distributed Object Caching: Memcached & Velocity
I’m working on a new project at work where we’re dealing with data that updates frequently, at unpredictable times, used in across several different front-end services, and needs to scale to pretty decent traffic levels without going nuts on buying more hardware.
So, given all that, one of the things we’re looking at is using a distributed object caching layer, such as memcached. If you’re not sure what this technology does, the quick summary is that it is used to store commonly accessed data in memory on your servers. One of the most common uses is to cache results from database queries.
memcached started it’s life at Danga Interactive to solve issues scaling LiveJournal at 20 million+ pageviews per day. It has a proven track record in the Unix world, and a fairly significant base of knowledge on what works and various workarounds and solutions.
Whilst memcached is from Unix, there are also Windows based ports of the server, and also .NET clients so using it in our environment shouldn’t be an issue from the technical side.
Recently Microsoft also announced their entry into this space with a project code named Velocity. It’s pretty similar to memcached, but also has some additional functions allowing things like Tagging and Regionalising (Partitioning) data. There’s also more support at the moment for different cache expiry methods, and the roadmap includes additional redundancy bits too.
For anyone who is considering how their applications will scale up, there’s plenty more to read on the subject.
Dare Obasanjo has a post from July 2007 about memcached on Windows, and also more recently about Velocity. Scott Hanselman (Who I’m happy to say is coming to Tech.Ed Australia 2008!) has a podcast up about Velocity, talking with Anil Nori - one of the smart fellows responsible for Velocity.
I’ll write some more on this as we progress down the build of this application.
LINQ to SQL Caching Gotcha
So, today I discovered an issue which related to me doing two calls something a little like this:
- Execute dc.sp_Proc1
- If some condition exists, execute dc.sp_Proc2, and then Execute dc.sp_Proc1 again with the same parameters.
- Insert some records into the database.
The problem is, the first time you execute the sproc, it caches the result. This would be okay for most instances, but in mine - I’m actually after the updated result.
A quick bit of googling revealed this post by Chris Rock. This approach of “turn off object tracking” works Ok if you don’t need to insert records on that Data Context.
My quick, dirty, and (possibly) really wrong approach was just to spin up a new Data Context, and re-execute that sproc.
I promise I’ll find a more sane way of fixing this
LINQ to SQL Learnings: Getting rid of the CRUD
With many web 2.0 applications there’s a basic three-tier architecture.. In our case the client is a Flex 3/Caringorm application, the Services are WCF/ASP.NET Web Services, and the Database SQL 2005.
One of the typical approaches to creating Web Services for this type of system is to use a CRUD type pattern. That is: all methods are based around either Creating, Retrieving, Updating, or Deleting records. In most usually done on a per-table basis, and means that you’re effectively making the Web Services a HTTP enabled SQL client.
For our situation, this wasn’t really appropriate for a number of reasons, including complex relationships between tables, and a need to reduce the amount of network traffic.
Another concern, although relatively minor, is to reduce the amount of work needed by the Flex team to implement the Web Services.
Ideally, we wanted to be able to share business objects as widely as possible, to reduce the amount of rework needed by everyone involved in implementing the interfaces.
Therefore we chose to go with task, or semantic based methods, and using the objects as needed by the Flex front-end. The work of validation, and mapping to appropriate tables would be done by the Web services.
An example of this might be that a Document had many properties, such as Media Items (pictures, video, etc), Tags, Authors, etc. However, within the database there might be a necessity to track Document Versions, What versions are Live, the relationships between Documents, Document Versions and Media Items.
Because the objects that I needed to send/receive didn’t match the objects that needed to be saved in the database, I needed to write a lot of “left hand/right hand code”: ServiceDocument.Property = SQLDocument.Property. Most of this was fairly simple code to write, but tracking the places where this takes place can be grow to become quite a challenge when the solution grows to dozens of tables.
This is an approximate list of what I need to do to add a property to one table:
- Add the Property to the Service Types
- Add conversion pieces to transpose the Service Type to/from the LINQ to SQL Object equivalents.
- Add the column to the Table in the Database Model for LINQ to SQL
- Add the column to all Stored Procedures in the Database Model which reference this, removing and re-adding them if this means new properties too. Don’t forget to ensure the return types on the re-added Stored Procedures are set correctly.
- Add the columns to the actual Stored Procedures, update parameters, etc
- Add the column to the actual Table
I can only imagine the Version Control conflict chaos that would ensue if you had several people making these changes concurrently.
I highly recommend grouping changes into a per-table basis, because it can take a while to go through all the additional pieces you have referencing the LINQ to SQL and Service Type object equivilents.
LINQ to SQL Learnings: SPROC or not SPROC?
With databases in many organisations, there’s a rule that’s carved in stone:
Thou shalt use Stored Procedures for all table access.
Regardless of your position on this, some clients just don’t give us any other option. The application is never given direct table access, and anything that doesn’t go through a stored procedure just generates errors. This could be because they’re enforcing access control on the data to a finer level than just table level, or because they need to ensure records are kept of all confidential information.
The picture for LINQ to SQL with this situation appears at first to be rather rosy, through two things:
- You can assign sprocs that handle inserts, updates, and deletes against your tables.
- You can directly execute sprocs as methods in your application. So, you can select data out of tables via this method.
Both of these things, are true. However, what doesn’t get mentioned is that LINQ to SQL will still perform reads (using dynamic SQL) from your database if you (for example) try to access children of a relationship, or insert/update rows**. That is: You can’t stop additional queries against the database.
So, as far as I can tell - this means LINQ to SQL, as it stands, cannot be used for these situations. Ofcourse, this doesn’t prevent me from using LINQ to XML, LINQ to Entities, etc - it just means that our ways of executing SQL still have not advanced.
There’s a possible way I think this could be fixed, but it really requires a bit of a rethink as to how LINQ to SQL works - perhaps this could be done as a third party LINQ provider, but really it should be done by Microsoft in LINQ to SQL v.Next.
The way I’d like to see it fixed, would be for someone to create an analysis tool that runs at publish time, that generates Stored Procedures for all the LINQ statements in your code, and then forces the compiled LINQ commands to map through these sprocs.
This would have the benefit of rapid prototyping/development, and being able to still enforce access through sprocs.
* = Well, there might be a way, but I’ve not yet found it.
** = It does select’s to get the updated column values from, for example, computed columns.
LINQ to SQL Learnings: SqlDateTime Overflow on Autogenerated Column
This is the first in (hopefully) a series of quick things I’ve picked up whilst tackling the previously mentioned project.
So, I have a table something like this:
CREATE TABLE [dbo].[Product](
[ProductID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](100) NOT NULL,
[Price] [int] NOT NULL,
[LastSaveTimestamp] [datetime] NOT NULL CONSTRAINT [DF_Product_SaveTimestamp
The key here is the default value on the column: LastSaveTimestamp.
If I then try to, say insert a new column into this table, for example using this code:
DatabaseContext dc = new DatabaseContext();
Product product = new Product();
product.Name = “test product”;
product.Price = 50;
dc.Products.InsertOnSubmit
Then I’d get an exception like:
System.Data.SqlTypes.SqlTypeException: SqlDateTime overflow. Must be between 1/1/1753 12:00:00 AM and 12/31/9999 11:59:59 PM..
The fix is actually really simple - In the table designer / DBML, you need to tell it that the column is auto-generated. Unfortunately this doesn’t seem to be automatically detected. It’s one of a few ‘just plain weird’ situations.
AzamSharp has the fix details, with a handy-dandy screenshot over on his blog.
