Now is a good time to start applying analytics to your data. It’s highly unlikely that a middle market company is truly going to have a “big data” problem. So, let me make a fast suggestion before you get too carried away installing Hadoop and creating an array of EC2 instances. Microsoft’s SQL Server is a nice product for building out your first simple data mart. You can run it locally on your own server, or you can host it very cost effectively at Rackpace. An 8GB (memory), with 320 GB of disk storage, Windows 64-bit server with SQL Server – Web Edition (pre-installed) will run you $468/month. Basically, $6K a year for a box that you don’t have to manage or maintain.
Why not MySQL? — after all, it’s free. You can just as easily host a MySQL server internally or at Rackspace — where it will cost you a paltry $409/month. MySQL is a fine choice, nothing wrong with it at all.
Having said that, here’s my case for spending the extra $60/month to go with SQL Server.
- Active Directory. Chances are pretty good that you’re running a Windows domain at your company and that you are using Active Directory for security and authentication. SQL Server is tightly integrated with Active Directory — so your data mart users can use their pre-existing domain accounts to login to SQL Server.
- SQL Server connects very well with other Microsoft Office components. In particular, MS-Query works really well with SQL Server. Users can story queries directly in Excel spreadsheets to pull data from your data warehouse. (MS-Query works with any ODBC data source – so you can do the same thing with MySQL – but it is easier to set up with SQL Server).
- SQL Server’s “query” dialect, T-SQL, is a very mature and powerful version of the venerable SQL language. So you can tackle very complex queries and sophisticated multi-step processes directly in your database without having to resort to using external code.
- Linked Servers. One of SQL Server’s most powerful features is its’ ability to connect to other 3rd party databases (including any ODBC data sources). This lets you pull data from operational data stores into your data mart (or a staging area enroute to the data mart) from directly within the database. I’ve successfully connected to many, many different external data sources, including legacy ISAM file-based systems, using Linked Servers.
- SQL Server Agent. A full-featured scheduling system that is built directly into the database server. You can create complex T-SQL scripts and schedule them to run on just about any timetable that you can imagine. The scheduler can email out success/failure messages and it logs output to standardized windows logs. The combination of linked servers and SQL Server agent provides you with the equivalent of an “application server” within the database.
- Alerts. Over time, I think that you’ll find that your data mart becomes a critical element of your computing environment. SQL Server’s alerting system provides a framework for alerting staff when problems arise — things like disks filling up, users getting locked out, etc.
- Access to the CLR. Microsoft’s CLR, or “Common Language Runtime”, is the engine that runs C#, ASP.NET and Visual Basic .NET. If you have to, you can write complex code in C# and connect to it from directly within SQL Server.
- SSIS. SQL Server Integration Services. SSIS is an advanced toolkit for converting and cleansing data on its way into the data mart. I’ve always found that I can do most of the transformations that I need to do via linked servers, bcp and T-SQL scripts. But, SSIS is a powerful alternative for really complex transformation coding. (SSIS requires at least the “standard” edition of SQL Server). (Take a look at Business Intelligence Studio at the same time).
- Advanced analytical capabilities — such as hypercubes and bitmapped indexes. At a certain point you might find yourself wanting do some more advanced analytics, which might require hypercubes and/or bitmapped indexes. You can easily upgrade your SQL Server to the Enterprise Edition to access these features. (Which, I will admit, is a little more pricey).
- A wealth of 3rd party tools and products. There a TONS of great 3rd party products that integrate well with SQL Server. For example, DBAmp, from forceAmp, is a SQL-Server based integration engine that connects SQL-Server databases to Salesforce.com instances. I’ve used DBAmp to integrate sales data with production data from our ERP systems inside of a datamart built with SQL Server.
Gartner agrees. SQL Server gets the “Magic Quadrant” distinction for operational database management systems.
SQL Server only runs on Windows-based operating systems and some of the advanced management features of the server need to be accessed from the Management Studio which only runs on Windows. However, almost EVERYTHING that you can do from the management console has a T-SQL equivalent — so you can generate T-SQL scripts from the Management Studio and run them from other platforms as you like. For this purpose, I recommend that you get a copy of RazorSQL, a dynamite multi-platform query and analysis tool.
No need to wait. It’s easy and cost-effective to build your first data mart with SQL Server.