Wednesday, June 27, 2012

Metadata in Software Development

In modern programming, there's a lot of cool stuff that can be done with metadata. Failing to consider it in one form or another is setting aside a major tool in any system architecture. Sadly, I see this from time to time, a project with a lot of information associated with it... typically the metadata ends up being expressed only as requirements... and is not applied any further in automation. My post on Edgewater's blog highlights code generation as one of my favorite ways metadata can be leveraged using things like standardized requirements documentation as metadata.

Wikipedia argues that metadata is divided into two categories, "data about data types" such as XSDs and WSDLs, and "data about content" is only helpful in determining application.  According to their current article, metadata is akin to information about a container such as a box..  how big is it?  how much can it hold?   Metacontent is information about what the box contains... what is it?  how much does it weigh?  what are its storage instructions?  what are the contents of the instruction manual?  The former information is often best applied at design time.  The latter, is often more economically applied at runtime.  I would argue that it goes deeper than that, as well.  One man's metacontent is another's metadata. The term metadata could be used to cover:
  • raw data
  • structure/language definition (syntax) (describing how to express a schema) 
  • structure/language expression definition (schema)
  • content
  • content description/classification
Metadata in information systems has two major aspects: expression and application. Expression is everything from the media used to the language (even characters) that are used to communicate information.  Expression implies how the information can be managed and transmitted, and even how the information can be applied. Application is all about how the information is used, impacting utilization of resources requried to leverage it at the time that it is applied. 

In application development, metadata can take many forms.  It can look like lots of different things.  Here's a list enumerating a few forms in the development world, along with pros and cons of each:
  • BLOB (Fully Custom/Encoded Expression)
    • Notes:  At the core of all other computer data expression is a specialized BLOB.  I mean, on disk, there's no visible difference between an Excel Spreadsheet and a JPEG image... it's just a block of bytes.  The meaning of each byte is interpreted by the application that uses it.  (Its beauty is in the eye of the beholder.) 
    • Pro's:  Most flexible expression.  Any information that can be expressed within a computer can be encoded to live in a Binary file. 
    • Con's:  Any "beholder" must typically be created from scratch, making it typically tightly coupled with its application.  Harder to manage in terms of media & transmission. 
  • Plain Text
    • Pro's:  Very flexible, customizable, can be loosely coupled to the application.  Lots of options with respect to media and transmission.
    • Con's:  Plain text can take any form, often has to be parsed using custom tools, care has to be taken to make sure plain text is extensible, human readable, and structured enough for processing.
  • XML
    • Pro's:  Much more structured by definition, easy to extend by definition.  Many many useful tools exist to define, manage, communicate, index, and consume XML.  Very loosely tied to its application(s).
    • Con's:  Not quite as human-readable as say, plain text might be.
  • JSON
    • Pro's:  Extensible, structured, blurs the line between code & metadata, since JSON (JavaScript Object Notation) is evaluated by runtimes as code.  Meant for web client consumption.
    • Con's:  not nearly as broadly supported as XML, tools are not available in as many contexts.  Not so well supported for consumption outside web clients.
  • XAML
    • Pro's:  XML based, and like JSON, is an object initialization notation.   It is more widely known for describing UI elements in WPF and Silverlight.  Directly compiles to code.
    • Con's:  Requires WPF or Silverlight runtimes to use, not well supported beyond client-side rich apps, media may be typically described as "embedded in binary code".
  • Embedded in Programming Languages
    • Pro's:  typically compiled into code, metadata can be expressed as code litterals or code attribution.  This means the data is almost fundamentally instantly available to the running application that consumes it.  Done well, especially with code attribution, this tends to provide meaningful human-readable information to a programmer (typically more so than comments) and a runtime processor at the same time.
    • Con's:  embedded metadata must be processed at runtime, every time.  In cases where there are large amounts of data involved, done wrong, this can be time consuming.  Typcially forces metadata content management into source code control, which arguably may not be the best way to manage it. (source control of metadata is a "Pro", but putting it in something like TFS can make it hard to reach.) Litterals and attribution can tie metadata to the code that consumes it.
  • Excel Spreadsheets
    • Pro's:  typically easily read & shared, easy to manage & communicate as a file. 
    • Con's:  not so easy to collaborate on, typically concurrent content managers must merge changes.  Automation can be klunky.
  • Database Tables
    • Pro's:  Easy medium for a developer to work with
    • Con's: Not so easy a medium for anyone else, hard to manage in source control, requires custom tools for end-users to work with if they need to.
  • SharePoint Lists
    • Pro's:  Easy medium for anyone to use / collaborate, SharePoint provides content management tools.  Data is easily decoupled, accessible via web (and web services), and exportable to files such as Excel spreadsheets.
    • Con's:  Hard to keep metadata sync'd with source code control.  (If this is a requirement, the workaround involves exporting documents and checking them in.)
Metadata application, in programming terms, can be accomplished in several different places.  It can be applied at:
  • Design Time
    • Common application:  code generation based on metadata or metacontent information.
  • Run Time
    • Common application:  runtime behavior/presentation modification and/or extension based on metacontent information.  This could include interpreted language processors. One could argue that .NET Common Intermediate Language (CIL) is metadata for that reason, since a runtime interprets it and executes native code from it.
  • Test Time
    • Common application:  baseline test data, test configuration.  (In turn, these can be applied at "design time" such as generating unit test code, or runtime, such as gathering baseline test data to compare to results, or determining executability and/or parameters for tests.)
  • Deploy Time
    • Common application:  Deployment targets/configuration based on build content metacontent.
Code generation is one of the most compelling uses for metadata in information systems, but it's certainly not the only use. One of my other favorite topics, SharePoint, highlights this by making metacontent a cornerstone of content and document management, classification, reporting, and search indexing. Throwing a document in a plain old file share provides you with a little metadata; the folder names it its file path typically has meaning, as well as the dates, times, and even the permissions applied to it. Throwing the same document in SharePoint provides a ton more information about the document content by default, and typically indexes it, making search a far more powerful tool.

Sunday, June 10, 2012

Keeping in the Code

At the end of the day, the business solution is always the most important part of the equation, but it's not the only part.  While I'm working on a solution, I'm also looking at tools, scaffolding, and framework.  This is especially true if others are going to be working on the project, and that accounts for nearly every non-trivial project.

How easy is it to set up?  How easy is it to work with?  Do the expressions make sense?  Can I hand it off to my least experienced teammate, get them to pick this up, and expect reasonable results?  (For that matter, can I hand it off to my most experienced teammate and expect them to respect the design decisions I made? )

Keeping my head in the code is critical.  Loosing touch with tools means shooting in the dark on the above questions.  It doesn't matter what their experience is, if you ask someone to push a tack into a corkboard, hand them the wrong tools for the job, they won't be able to push the thumbtack into the corkboard... or you'll nuke your budget paying for tools that are overpowered for the job.  (But that thumbtack will be SO IN THERE!)

In any case, in most projects, after the architecture & technical designs have been sorted out, frameworks, built, automations put in place, I'll take on the coding, too.

Of course, I've said this before...  if you can really simplify the work, what's to stop you from taking the extra step and automating it?   I'm always eyeing code, especially "formulaic", repetititive stuff, looking for opportunities to simplify, abstract, and/or automate.

Saturday, June 9, 2012

Custom Guid's

Caught a question from Stacy Draper, @StacyDraper this morning about custom guids, to make them more recognizable.

It reminded me of a post I saw recently about Facebook using hex characters to make IPv6 addresses more recognizable.

Here's what I was thinking... create a guid that has an embedded word.

For example, the following code creates a Guid that always starts with FACEB00C:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace CustomGuidTest
    class Program
        static void Main(string[] args)
            Guid customGuid = GenerateCustomGuid();
        static Guid GenerateCustomGuid()
            Guid result;
            //backwards, but required this to achieve desired result
            //Likely due to some standard with respect to bit order.
            byte[] custom = new byte[] { 0x0C, 0xB0, 0xCE, 0xFA  };

            byte[] random = Guid.NewGuid().ToByteArray();
            byte[] final = new byte[16];
            for (int idx = 0; idx <16; idx++)
                if (idx >= custom.Length)
                    final[idx] = random[idx];
                    final[idx] = custom[idx];
            result = new Guid(final);
            return result;
Example output:

Friday, June 8, 2012

Hobby Project Supporting #NoKidHungry Now Has Free Edition

I'm please to say my hobby project, a Windows Phone app I call "Jimmy Sudoku" is now available both for free or for purchase. 

The two SKUs in the Windows Phone App Marketplace are identical.  

The free version is available in almost all markets around the world (including the US). 

The paid version is only available in the US and 100% of the proceeds continue to support #NoKidHungry.

Link to Free SKU

Link to #NoKidHungry SKU

Please...  Enjoy!  :)

Sunday, June 3, 2012

GSSPUG Hub (Free App) for Windows Phone Now Available

My "artisan portfolio" of Windows Phone apps just DOUBLED in size!  Yes, I've now successfully published my second Windows Phone app.  :)

The Granite State SharePoint Users Group Hub is a somewhat minimal app, but if you're a member of the group, it's got some useful features.   My favorites are being able to get info about the next meeting, (both in the app, and as a live tile) and being able to RSVP through EventBright.

The direct link to find it in the Marketplace on your Windows Phone is this.

Regarding the name...  GSSPUG?  Ya, I know... it's not quite as intuitive as NHSPUG...    

If you're from New Hampshire, you know you search for "Granite State" any time you're looking for something local...  and if you don't know that, it probably is just as well you don't find it.  ;)

One other nice thing is that the content is largely driven from the group's web site, which, of course, is a SharePoint site.   The app does require a network connection, but it can be updated without having to go through the week-long process of publishing an update. 

Like Jimmy Sudoku, the app uses your phone's system wide theme colors.

Essentially this is what ends up in the Hub app.

And it appears like so: