Introduction to MapReduce for .NET Developers

.NET, Software Development May 6th, 2009

The basic model for MapReduce derives from the map and reduce concept in functional languages like Lisp.
In Lisp, a map takes as input a function and a sequence of values and applies the function to each value in the sequence.
A reduce takes as input a sequence of elements and combines all the elements using a binary operation (for example, it can use “+” to sum all the elements in the sequence).

MapReduce, inspired by these concepts, was developed as a method for writing processing algorithms for large amounts of raw data. The amount of data is so large that it can’t be stored on a single machine and must be distributed across many machines in order to be processed in a reasonable time.
In systems with such data distribution, the traditional central processing algorithms are useless as just getting the data to the centralized CPU running the algorithm implies huge network costs and months (!) spent on transferring data from the distributed machines.
Therefore, processing such massive scales of distributed data implies the need for parallel computing allowing us to run the required computation “close” to where the data is located.
MapReduce is an abstraction that allows engineers to write such processing algorithms in a way that is easy to parallelize while hiding the complexities of parallelization, data distribution, fault tolerance etc.

This value proposition for MapReduce is outlined in a Google research paper on the topic:

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day.

The MapReduce Programming Model

As explained earlier, the purpose of MapReduce is to abstract parallel algorithms into a map and reduce functions that can then be executed on a large  scale distributed system.
In order to understand this concept better lets look at a concrete map reduce example – consider the problem of counting the number of occurrences of each word in a large collection of documents:

map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
  EmitIntermediate(w, "1"); 

reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
  result += ParseInt(v);
Emit(AsString(result));

The map function goes over the document text and emits each word with an associated value of “1”.

The reduce functions sums together all the values for each word producing the number of occurrences for that word as a result.

First we go through the mapping phase where we go over the input data and create intermediate values as follows:

  • Records from the data source (lines out of files, rows of a database, etc.) are fed into the map function as <key,value> pairs.For example: <filename, file content>
  • The map function produces one or more intermediate values along with an output key from the input

After the mapping phase is over, we go through the reduce phase to process the intermediate values:

  • After the map phase is over, all the intermediate values for a given output key are combined together into a list and fed to the reduce function.
  • The reduce function combines those intermediate values into one or more final values for that same output key

Notice that both the map and the reduce functions run on independent set of input data. Each run of the map function process its own data source and each run of the reduce function processes the values of a different intermediate key.

Therefore both phases can be parallelized with the only bottleneck being the fact that the map phase has to finish for the reduce phase to start.

The underlying system running these method is in takes care of:

  • Initialize a set of workers that can run tasks – map or reduce functions.
  • Take the input data (in our case, lots of document filenames) and send them to the workers to map
  • Streamline values emitted by map function to the worker (or workers) doing the reduce. Note that we don’t have to wait for a certain map run to finish going over the entire file in order to start sending its emitted values to the reducer, so that the system can prepare the data for the reducer while the map function is running
    (In Hadoop – send the map values to the reducer node and andle grouping by key).
  • Handle errors – support a reliable, fault tolerant process as workers may fail, network can crush preventing workers from communicating results, etc.
  • Provides status and monitoring tools.

A Naive Implementation in C#

Lets see how we can build naive MapReduce implementation in C#.

First, we define a generic class to manage our Map-Reduce process:

public class NaiveMapReduceProgram<K1, V1, K2, V2, V3>

The generic types are used the following way:

  • (K1, V1) – key-value types for the input data
  • (K2, V2) – key value types for the intermediate results (results of our Map function)
  • V3 – The type of the result for the entire Map-Reduce process

Next, we’ll define the delegates of our Map and Reduce functions:

public delegate IEnumerable<KeyValuePair<K2, V2>>   MapFunction(K1 key, V1 value);
public delegate IEnumerable<V3>                     ReduceFunction(K2 key, IEnumerable<V2> values);
private MapFunction _map;
private ReduceFunction _reduce;
public NaiveMapReduceProgram(MapFunction mapFunction, ReduceFunction reduceFunction)
{
    _map = mapFunction;
    _reduce = reduceFunction;
}

(Yes, I realize I could use .NET’s Func<T1,T2,TResult> instead but that would just result in horribly long ugly code…)

Now for the actual program execution. The execution flow is as follows: We take the input values, pass them through the map function to get intermediate values, we group those values by key and pass them to the reduce function to get result values.

So first, lets look at the mapping step:

private IEnumerable<KeyValuePair<K2, V2>> Map(IEnumerable<KeyValuePair<K1, V1>> input)
{
    var q = from pair in input
            from mapped in _map(pair.Key, pair.Value)
            select mapped;

    return q;
}

Now after we got the mapped intermediate values we want to reduce them. The Reduce function expects a key and all its mapped values as input so to do that efficiently we want to group the intermediate values by key first and then call the Reduce function for each key.

The output of this process is a V3 value for each of the intermediate K2 keys:

private IEnumerable<KeyValuePair<K2, V3>> Reduce(IEnumerable<KeyValuePair<K2, V2>> intermediateValues)
{
    // First, group intermediate values by key
    var groups = from pair in intermediateValues
                 group pair.Value by pair.Key into g
                 select g;

    // Reduce on each group
    var reduced = from g in groups
                  let k2 = g.Key
                  from reducedValue in _reduce(k2, g)
                  select new KeyValuePair<K2, V3>(k2, reducedValue);

    return reduced;
}

Now that we have the steps code the execution itself is simply defined as Reduce(Map(input)) :

public IEnumerable<KeyValuePair<K2, V3>> Execute(IEnumerable<KeyValuePair<K1, V1>> input)
{
    return Reduce(Map(input));
}

The full source code and tests can be downloaded from here:

Map-Reduce Word Counting Sample – Revisited

Lets go back to the word-counting pseudo code and write it in C#.

The following Map function gets a key and a text value and emits a <word, 1> key-pair for each word in the text:

public IList<KeyValuePair<string, int>> MapFromMem(string key, string value)
{
    List<KeyValuePair<string, int>> result = new List<KeyValuePair<string, int>>();
    foreach (var word in value.Split(' '))
    {
        result.Add(new KeyValuePair<string, int>(word, 1));
    }
    return result;
}

Having calculated a <word, 1> key-pair for each input source, we can group the results by the word and then our Reduce function can sum the values (which are 1 in this case) for each word:

public IEnumerable<int> Reduce(string key, IEnumerable<int> values)
{
    int sum = 0;
    foreach (int value in values)
    {
        sum += value;
    }

    return new int[1] { sum };
}

Our program code looks like this:

MapReduceProgram<string, string, string, int, int> master = new MapReduceProgram<string, string, string, int, int>(MapFromMem, Reduce);
var result = master.Execute(inputData).ToDictionary(key => key.Key, v => v.Value);

The result dictionary contains a <word, number-of-occurrences> pairs.

Other Examples

Distributed LINQ Queries. One of POCs I’m working on using the above naive, LINQ-based implementation, is running a distributed LINQ query. Imagine you have a system where raw data is distributed across several SQL Servers. We can have our map function run a LINQ-to-SQL query on multiple DataContexts in parallel (the value input for the map function – V1 – can be a DataContext) and then reduce it to a single result set. This is probably a naive\simplified implementation of what the guys at Microsoft’s Dryad team are doing.

Count URL Visits. Consider you have several web servers and you want to produce the amount of visits for each page on your site. You can produce pretty much the same way the word-counting example works. The map function parses a log file and produce a <URL, 1> intermediate value. The reduce function then sums the values for each URL and emits <URL, number of visits>

Distributed Grep. You can run a grep search on a large amount of files by having the map function emits a line if it matches a given pattern. The reduce function in this case is just an identity function that copies the supplied intermediate data to the output.

Map-Reduce in the Real World

The real complexity and sophistication in MapReduce is in the underlying system takes care of running and managing the execution of MapReduce jobs. Real world MapReduce implementations, like Google’s system, Hadoop or Dryad have to go beyond the naive implementation shown here and take care of things like resource monitoring, reliability and fault tolerance (for example, handle cases where nodes running map\reduce jobs crush, or go offline due to network problems).

The following resources are worth checking out:

Tags: , , , , ,

Developing a Robust Data Driven UI Using WPF – An Overdue Summary (and full source code)

.NET, Software Development, WPF April 15th, 2009

I wrote the stocky application more than a year ago as a research project aimed at proving that using WPF we can separate presentation metadata (XAML) from program logic. The goal was to provide the Duet team at SAP with a document reference sample for using M-V-VM to achieve this separation.

I started documenting the proof-of-concept in a series of posts but unfortunately after leaving SAP my interests (and work) shifted away from WPF and I didn’t find the time to finish the series.

I’ve received numerous requests to release the source code but I couldn’t do so because it was part of a larger infrastructure code I wrote at SAP which basically ads a lot of noise to the sample (an d probably ads legal issues for me sharing it).
Anyway, I took some time off this afternoon to re-write the sample independently so that I could share it:

It can be found on my SkyDrive

This, I guess is the long overdue ending for the series:

  • Introduction – introduces the concept of M-V-VM and the reasoning behind it.
  • The DataModel – describes how to write the Model part of our application.
  • Stock DataModel Sample – provides a conrete implementation of a Stock model and its view.. 

However, If you’re interested in M-V-VM in WPF, there are numerous topics worth mentioning that I didn’t get to cover and are definitely worth checking out:

Unit Testing

As I said in the introduction post, one of the most important benefits of seperating the logic code from the presentation (XAML) is that its straightforward to unit test. In fact, my next post following the Stock DataModel Sample was going to be about unit testing – specifically, how to test the DataModel its provider which, because of the use of threading, is a bit tricky.

This post is actually 99% done in the comments of the unit test code that’s in DefaultStockQuoteProviderTest.cs in the  provided source code. So do yourself a favor and go over the code. It’s not long and very well documented…

Using Lambda Expression for DataBinding

Data-binding is pretty much at the heart of the M-V-VM concept and it makes us write Value Converters which is pretty tedious and annoying.
Wouldn’t it be great if we could replace writing lots of IValueConverter classes like this:

<TextBlock Foreground="{Binding Change, Converter={StaticResource StockForegroundConverter}}" … />

[ValueConversion(typeof(double), typeof(Brush))]
public class StockChangeToBrushConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
    {
        double change = (double)value;
        if (change == 0) return Brushes.Black;
        return (change < 0) ? Brushes.DarkRed : Brushes.Green;
    }

    public object ConvertBack(object value, Type targetType, object parameter, CultureInfo culture)
    {
        return double.NaN;
    }
}

To just the following XAML statement that embeds the conversion logic:

<TextBlock Foreground="{Binding Change,

Converter={ change=> if (change == 0) return Brushes.Black; return (change < 0) ? Brushes.DarkRed : Brushes.Green; }}" … />

M. Orçun Topdağı wrote an excellent series on using Lambda Expressions for data-binding in WPF to achieve just that:

Reference Applications and Guidance

I haven’t seen a lot of sample WPF LOB reference applications out there but here are some interesting links for further learning:

Tags: , , , ,

The Dark Side of LINQ

.NET August 5th, 2008

I’ve been having mixed feeling for quite some time now regarding LINQ.
Sure it can make working with data sources a lot easier and it can definately save a lot of code…
But, what happens with the following C# foreach statement

List<KeyValuePair<string, string>> resultList = new List<KeyValuePair<string, string>>();
string[] paramsArray = parameters.Split(new char[] { '&' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string p in paramsArray)
{
    int index = p.IndexOf('=');
    if (index > 0)
    {
        string key = p.Substring(0, index);
        string value = p.Substring(index + 1);
        resultList.Add(new KeyValuePair<string, string>(key, value));
    }
}

IEnumerable<KeyValuePair<string, string>> result =
    resultList.Distinct((p1, p2) => p1.Key == p2.Key);

Turns to this query:

var distinctPairs = (from keyValuePair in parameters.Split(new char[] { '&' }, StringSplitOptions.RemoveEmptyEntries)
                     let index = keyValuePair.IndexOf('=')
                     where index != -1
                     let key = keyValuePair.Substring(0, index)
                     where !string.IsNullOrEmpty(key)
                     let valueText = keyValuePair.Substring(index + 1)
                     select new { Key = key, ValueText = valueText })
                             .Distinct( (p1, p2) => (p1.Key == p2.Key) )
                             .ToArray();

I don’t know about you but I find the first version a lot more approachable, readable and quicker to understand. The same code in LINQ is not shorter and looks simply looks Evil.

LINQ is like the force… It can be used to wonderful code that is simple and functional, but it also has the potential of producing cryptic code that’s hard to maintain.

Use it wisely and don’t be tempted for its dark side…

Tags:

Developing a Robust Data Driven UI Using WPF – Stock DataModel Sample

.NET, Software Development, WPF March 30th, 2008

On the previous post in this series we looked into the DataModel component in our architecture in detail and defined an abstract DataModel base class to derive our models from. On this post we’ll implement a concrete data model to represent a stock’s value. Why stock? It’s an object with a changing value that requires our DataModel constantly refresh and keep its data “alive”, and it’s simple to implement which makes it a perfect example for our first DataModel. The first thing we’ll do when defining our Stock DataModel is abstract the data source. This way we can easily implement several data sources for fetching a stock’s data and instantiate the DataModel with the right one (for example, read from Yahoo at runtime, read from fake data source during unit testing):

/// <summary>
/// Defines the interface allowing <see cref="StockDataModel"/> to read quotes from various providers.
/// </summary>
public interface IStockDataProvider
{
    /// <summary>
    /// Gets a given stock symbol's (given by <paramref name="symbol"/>) data.
    /// </summary>
    /// <param name="symbol">The stock's symbol.</param>
    /// <param name="name">The stock's company name.</param>
    /// <param name="quote">The last stock's quote.</param>
    /// <param name="change">The stock's change value.</param>
    /// <param name="open">The stock's open value.</param>
    /// <returns><b>True</b> if data was retrieved successfully; otherwise, <b>False</b>.</returns>
    bool TryGetData(string symbol, out string name, out double quote, out double change, out double open);
}

Now that we have our data source defined we can implement different stock data providers for our DataModel to consume. Now, lets go over the StockDataModel class:

public class StockDataModel : DataModel
{
    private string _symbol;
    private IStockDataProvider _quoteProvider;
    public StockDataModel(string symbol, IStockDataProvider provider)
    {
        _symbol = symbol;
        _quoteProvider = provider;
        this.State = DataModelState.Fetching; 

        // Queue a work item to fetch the symbol's data
        if (!ThreadPool.QueueUserWorkItem(new WaitCallback(FetchDataCallback)))
        {
            this.State = DataModelState.Invalid;
        }
    } 

    public string Symbol
    {
        get { return _symbol; }
    }

Our StockDataModel constructor takes the stock symbol that the model represents and an IStockDataProvider to fetch the stock’s data from. We set the initial DataModel state to Fetching and queue a work item for a background thread to update our model with the stock’s data – company name, quote, change value and open value. If we fail to queue the work item than we put the model in an invalid state. Next, we need to define the properties exposed by StockDataModel for data binding.

public string Name
{
    get
    {
        VerifyCalledOnUIThread();
        return _name;
    }
    private set
    {
        VerifyCalledOnUIThread(); if (_name != value) { _name = value; OnPropertyChanged("Name"); }
    }
}
public double Quote
{
    get
    {
        VerifyCalledOnUIThread(); return _quote;
    }
    private set
    {
        VerifyCalledOnUIThread(); if (_quote != value) { _quote = value; OnPropertyChanged("Quote"); }
    }
}
...


We’re sign a private setter to update the property values and trigger a PropertyChanged event if required. You can also add calculated properties. For example:

public double ChangePercent
{
    get
    {
        if (double.IsNaN(Change))
            return double.NaN; 

        if (double.IsNaN(Open))
            return double.NaN; 

        try
        {
            double change = (Change / Open) * 100; return change;
        }
        catch
        {
            return double.NaN;
        }
    }
}

In this case, it is important to remember to trigger the property change event for ChangePercent too when the values it depends on change… Now for the implementation of the FetchDataCallback. This method will be called by a background thread to update the stock data. Since this method is called by a background thread we’re free to perform expensive operations, such as calling a web service to fetch the stock’s data from an online provider (like Yahoo).

private void FetchDataCallback(object state)
{
    string fetchedName;
    double fetchedQuote;
    double fetchedChange;
    double fetchedOpen;

    if (_quoteProvider.TryGetData(_symbol, out fetchedName, out fetchedQuote, out fetchedChange, out fetchedOpen))
    {
        this.Dispatcher.BeginInvoke(
            DispatcherPriority.ApplicationIdle,
            new ThreadStart(
                delegate
                {
                    this.Name = fetchedName;
                    this.Quote = fetchedQuote;
                    this.Change = fetchedChange;
                    this.Open = fetchedOpen;
                    this.State = DataModelState.Active;
                }));
    }
    else
    {
        this.Dispatcher.BeginInvoke(
            DispatcherPriority.ApplicationIdle,
            new ThreadStart(
                delegate
                {
                    this.State = DataModelState.Invalid;
                }));
    }
}

On the previous post, on the WPF threading model overview we noted the following:

If only the creator of a DispatcherObject can access it, how can a background thread interact with the user? The background thread does not access the UI directly but it can ask the UI thread to perform a task on its behalf by registering work items to its Dispatcher using it’s Invoke (for a synchronous call that returns when the UI thread finished executing the delegate) or BeginInvoke methods (which runs asynchronously)

In the above code, after fetching the data on the _quoteProvider.TryGetData we need to communicate these changes back to the UI thread. We use the Dispatcher to set the new values for the DataModel properties which ensures that our property change events will be triggered on the UI thread.

Keeping the Data Alive

So far, our code only fetches the stock data once. Lets see what it takes make out DataModel keep its data alive.

protected override void OnEnabled()
{
    _timer = new DispatcherTimer(DispatcherPriority.Background);
    _timer.Interval = TimeSpan.FromMinutes(5);
    _timer.Tick += delegate { ScheduleUpdate(); };
    _timer.Start(); 

    ScheduleUpdate();
}
protected override void OnDisabled()
{
    _timer.Stop();
    _timer = null;
}
private void ScheduleUpdate()
{
    VerifyCalledOnUIThread();
    // Queue a work item to fetch the quote
    if (ThreadPool.QueueUserWorkItem(new WaitCallback(FetchDataCallback)))
    {
        this.State = DataModelState.Fetching;
    }
}

The above code defines a timer that is active when the DataModel is Enabled. The timer calls ScheduleUpdate every 5 minutes to perform the same data update using a background thread logic we performed on our constructor. We’re using a DispatcherTimer so that the calls to ScheduleUpdate will be made using the Dispatcher’s thread (the UI thread) so that we can update the DataModel’s state without a hassle. If we had used System.Threading.Timer then ScheduleUpdate would be called on the timer’s thread requiring the use of Dispatcher.BeginInvoke to update the state…

That’s it…

We’ve got the basic DataModel implemented. You can using it in you’re XAML window to see it working… To get a basic XAML running you’ll need to define a content control:

<ContentControl x:Name="_content" />

And set its content to a StockDataModel instance on your codebehind:

_content.Content = new StockDataModel("AAPL", someProvider);

Then all you need to do is define a data template for the StockDataModel type to control it’s appearance. Here’s a simple template for example:

<DataTemplate x:Name="StockTemplate" DataType="{x:Type local:StockDataModel}">

   <StackPanel Orientation="Horizontal" mdb:EnableModel.DataModel="{Binding}" Height="30px" Width="Auto" ClipToBounds="True">

     <TextBlock Text="{Binding Name}" Foreground="#737271" Width="120" Padding="3,0,0,3" Style="{StaticResource StockText}" /> 

     <TextBlock Text="{Binding Quote}" Foreground="#737271" Width="55" Padding="0,0,0,3" Style="{StaticResource StockText}" />  

   </StackPanel> 

</DataTemplate>

You can find the code discussed in this article plus my own implementation for an IStockDataProvider that reads stock data from Yahoo here: On the next post we’ll discuss DataModel unit testing and see how the StockDataModel tests are implemented.

kick it on DotNetKicks.com

Comments (5) imported from www.ekampf.com/blog/:

Sunday, March 30, 2008 10:45:52 PM (GMT Daylight Time, UTC+01:00)

Thanks for the series! Looking forward for the following parts. However, there’s a bug in the shown code as you cannot check if a value is NaN by comparing to double.NaN. You have to use double.IsNaN(…).

Use IsNaN to determine whether a value is not a number. It is not possible to determine whether a value is not a number by comparing it to another value equal to NaN.

Simon Monday, March 31, 2008 4:39:37 AM

(GMT Daylight Time, UTC+01:00)

Hey Simon, Thanks.

Fixing the code and the post…

Regards,
Eran

Eran Kampf

Friday, April 04, 2008 3:47:12 AM (GMT Daylight Time, UTC+01:00)

Very nice article series. Keep up the good work!

Kevin Kerr

Wednesday, May 28, 2008 2:55:06 PM (GMT Daylight Time, UTC+01:00)

Really great series, very nicely done.

Question: Why call VerifyCalledOnUIThread() in the ScheduleUpdate method? Since you’re calling BeginInvoke on the dispatcher inside FetchDataCallback all should be well, right?

Mike

Thursday, May 29, 2008 11:41:31 AM (GMT Daylight Time, UTC+01:00)

Hi Mike,

Good question. Notice that besides calling queuing a work item that calls FetchDataCallback, the ScheduleUpdate method also updates the model’s State to DataModelState.Fetching when that work item is queued. Since we’re changing the actual model we need to make sure we’re doing it in the UI thread. Alternatively, we could have used a System.Threading.Timer to do the updates ScheduleUpdate() will be called on a background thread directly, but then we couldn’t set the model state to fetching. We’d have to send that back to the UI thread.

Regards,
Eran Kampf

Eran Kampf

Tags: , , , ,

Developing a Robust Data Driven UI Using WPF – The DataModel

.NET, Software Development, WPF March 24th, 2008

imageIn the first post in the series I gave an overview of the pattern we’ll be using.
This post will go deeper into the DataModel, as defined in the previous post:

The DataModel is defined exactly as the Model in MVC; it is the data or business logic that stores the state and does processing of the problem domain.
The DataModel abstracts expensive operations such as data fetching without blocking the UI thread. It can keep data “alive” fetching it periodically from source (example: stock ticket), merge information from several sources etc.
The DataModel is completely UI independent and pretty much straightforward to unit test.

The DataModel exposes data in a way that makes it easily consumable by WPF. As such, all if its public APIs, called by WPF for data-biding, must be called on the UI thread only. It must not block the UI thread because we want a robust functional UI so it usually performs operations on a background thread using the Dispatcher to send results back to the UI thread.

Therefore, the simplest DataModel implementation exposes several public Properties that expose data, implements INotifyPropertyChanged and/or INotifyCollectionChanged, and it abstracts the way information is fetched (using background threads to avoid blocking the UI thread when fetching the data is an expensive operation).

For two-way binding a commit and rollback mechanism, a dirty flag, etc.&nbsp; We’ll get to that later on…

As the DataModel implementation needs to abstract expensive data fetching operations and work with multiple threads we need some basic understanding of WPF’s threading model before we look at the DataModel implementation…

WPF Threading Model – A Quick Overview

A typical WPF uses two threads:

  • Rendering thread – runs in the background and handles rendering
  • UI thread – Receive inputs, handles events, paints the screen and runs application code.

The UI thread queues work items in a Dispatcher object. The Dispatcher object selects work items on a priority basis and runs each one to completion.
Every UI thread must have at least one Dispatcher, and each Dispatcher can only use one thread to execute work items.

Therefore, in order to build responsive UI that doesn’t block the UI thread, the application has to maximize the Dispatcher’s throughput by keeping work items small as to minimize the time the Dispatcher spends on processing them – which keeps other work items waiting causing the UI to lag.

In order to perform expensive operations without blocking the UI thread we can use a separate thread that will run in the background, leaving the UI thread free to process items in the Dispatcher queue. When the background thread is done processing it can report results back to the UI thread for display.
Doing this isn’t trivial as Windows only allows UI elements to be accessed by the thread that created them. This means that the background thread we used for some long-running task cannot access and update our UI when it is finished (or during work to show progress) – a background thread updating a control (such as a list box) during its rendering can cause strange UI behaviors that this limitation is there to prevent.

WPF uses the following design to enforce this kind of coordination between the UI thread and other threads:
Most of the classes in WPF derive from DispatcherObject. During construction, a DispatcherObject stores a reference to the Dispatcher&nbsp;linked with the current running thread – creating an association between itself and the thread that created it.
At the beginning of every method in the DispatcherObject, it calls VerifyAccess which compares the Dispatcher associated with the current thread with the Dispatcher stored during the object’s construction – if they do not match it throws an exception.

If only the creator of a DispatcherObject can access it, how can a background thread interact with the user?
The background thread does not access the UI directly but it can ask the UI thread to perform a task on its behalf by registering work items to its Dispatcher using it’s Invoke (for a synchronous call that returns when the UI thread finished executing the delegate) or BeginInvoke methods (which runs asynchronously)

The DataModel Class

So now, after the brief discussion on the use of the Dispatcher we can start coding our base DataModel class.
We’ll start with the simple class and constructor definition:

public abstract class DataModel : DispatcherObject, INotifyPropertyChanged
{
    public DataModel()
    {
    }


We’re deriving from DispatcherObject because we need to have the Dispatcher available so that we can run background jobs that dispatch results to the UI thread.

As discussed earlier, each call to the DataModel should be made on the UI thread. Therefore we would like to enforce that limitation at the beginning of each publicly exposed API. The DispatcherObject class that we derived from contains a VerifyAccess() method that does just that. The method is public but unfortunately marked with the [EditorBrowsable(EditorBrowsableState.Never)] attributes which will make it hard to find for developers using driving their data model from our class.

To resolve this I simply defined a protected method as follows:

/// &lt;summary&gt;
/// Makes sure the call is in the correct thread (the UI thread) by comparing the current dispatcher
/// object with the dispatcher we got when the DataModel was created.
/// &lt;/summary&gt;
[System.Diagnostics.Conditional("Debug")]
protected void VerifyCalledOnUIThread()
{
    this.VerifyAccess();
}

This method will be visible to anyone deriving from our class and it simply calls VerifyAccess to make sure code is made from the UI thread.
The Conditional attribute makes this code execute only in debug bits avoiding this kind of assertion on retail bits – some performance gain.

In order to support asynchronous data fetching the DataModel should encapsulate the information about its state – valid (data fetched), invalid (error fetching data), fetching (processing).

public enum DataModelState
{
    /// &lt;summary&gt;
    /// The model is fetching data
    /// &lt;/summary&gt;
    Fetching,
    /// &lt;summary&gt;
    /// The model is in an invalid state
    /// &lt;/summary&gt;
    Invalid,
    /// &lt;summary&gt;
    /// The model has fetched its data
    /// &lt;/summary&gt;
    Active
}

The data model’s state is exposed using a property:

public DataModelState State
{
    get
    {
        VerifyCalledOnUIThread();
        return _state;
    }
    set
    {
        VerifyCalledOnUIThread();
        if (value != _state)
        {
            _state = value;
            OnPropertyChanged("State");
        }
    }
}

We also implement INotifyPropertyChanged to allow the model to communicate changes in its values.
Since adding\removing event handlers to the PropertyChanged event is a public API exposed by the DataModel, it also requires verification that calls to it are made from the UI thread. We’ll define our own add\remove handlers in order to perform this verification:

protected virtual void OnPropertyChanged(string propertyName)
{
    VerifyCalledOnUIThread();

    if (_propertyChangedEvent != null)
    {
        _propertyChangedEvent(this, new PropertyChangedEventArgs(propertyName));
    }
}

#region INotifyPropertyChanged Members
public event PropertyChangedEventHandler PropertyChanged
{
    add
    {
        VerifyCalledOnUIThread();
        _propertyChangedEvent += value;
    }
    remove
    {
        VerifyCalledOnUIThread();
        _propertyChangedEvent -= value;
    }
}
#endregion

Any property that we’ll add to our data model will call OnPropertyChanged on its setter in order to notify it has changed.

It’s Alive!

One more ability we’d like to add to our DataModel class is the ability to enable\disable it.
As defined earlier, the DataModel encapsulates the logic of fetching data and keeping it “alive” and up to date. To do that, it’ll need to keep some internal timer for updating information or register to some change notification event on its source.
This will keep the DataModel alive and can result in memory leaks, which is why we need some way to turn the DataModel on and off, allowing it to unregister from its data sources when that connection is no longer required:

public bool Enabled
{
    get
    {
        VerifyCalledOnUIThread();
        return _isEnabled;
    }
    set
    {
        VerifyCalledOnUIThread();
        if (value != _isEnabled)
        {
            _isEnabled = value;
            OnPropertyChanged("Enabled");
        }
    }
}

public void Enable()
{
    VerifyCalledOnUIThread();

    if (!_isEnabled)
    {
        this.Enabled = true;
        OnEnabled();
    }
}

public void Disable()
{
    VerifyCalledOnUIThread();

    if (_isEnabled)
    {
        this.Enabled = false;
        OnDisabled();
    }
}
protected virtual void OnEnabled()
{
}
protected virtual void OnDisabled()
{
}

When binding UI elements to the DataModel we’ll need some mechanism to enable the DataModel when the element is loaded and disable it when the element is unloaded. There’s an elegant way to implement this behavior which we’ll implement in a future post.

That’s it! We’ve got a basic class to derive out data models from. Note that we’re only addressing one-way data binding for the moment. We’ll address a two-way data model (which requires the ability to commit\rollback data etc.) in future post.

On the next post we’ll look into a concrete DataModel implementation for our Stocky application.

You can download the code for this post from here:

&nbsp;

Further Reading

kick it on DotNetKicks.com

Tags: , , , ,

Developing a Robust Data Driven UI Using WPF – Introduction

.NET, Software Architecture, Software Development, WPF March 18th, 2008

WPF, Microsoft’s not-so-new-anymore UI technology offers new capabilities allowing both developers and designers to work together to achieve a stunning experience for their applications.

Power, however, does not come without complexity, and WPF does not provide a framework or a model to solve many of the problems faced by developers and designer when building an application:

1. Handling Rich Data Forms. Many applications, especially when it comes to enterprise applications, rely heavily on displaying and manipulating data. Fetching the data while keeping the UI alive and responsive is a complicated task that’s also hard to debug and requires an experienced developer doing the work.
Can we come up with a framework that will simplify data fetching?

2. Testability is a Requirement for Software Development Framework. Development organizations are no longer satisfied with simple reduction of costs for initial development and there’s a growing demand for frameworks and tool to facilitate a sustainable and agile development process.
Can we come up with a model that will allow writing tests for the application’s UI and behavior?

3. Metadata Driven User-Interface. WPF provides XAML as a meta-model for UI definitions. However there is no clear separation between metadata and code which is a mess when it comes to designer and developers working together.
Can we come up with a model to allow developers provide all the UI logic as closed building blocks that designer can just use in a plug-and-play manner?

Providing a Framework for Building Robust, Data-Driven UIs

The Model\View\Controller (MVC) architectural pattern has long been used by complex applications to present large amount of data to the user.
The pattern allows developers to separate the actual data (Model) from the user interface (View) and the business logic manipulating the data (Controller).

In the following set of articles I will present a variation of the MVC pattern tailored for modern UI development (in WPF) where we’d like the View to be the responsibility of a designer rather than a classic developer writing code.

I’ll be using the DataModel\ViewModel\View terminology to describe the pattern (although you may find the same pattern described using various other terminologies when browsing the net).

Introducing the DataModel\ViewModel\View Pattern

As mentioned earlier, the DataModel\ViewModel\View pattern is a variation of the MVC pattern. Its focus is on making the View, which is the actual UI presented to the user, the responsibility of a designer - a person who is generally more oriented towards graphics, art and interaction than to classic coding.

The design of the view should be done in a declarative form (XAML) using a WYSIWYG tool (Expression Blend).
In short, the actual UI is developed using different tools and languages by a person with a different skills set than business logic and data backend.

In order to understand the meaning behind the DataModel\ViewModel\View terminology lets look at the following diagram describing
typical architecture for our application’s presentation using this pattern:

image

The DataModel

The DataModel is defined exactly as the Model in MVC; it is the data or business logic that stores the state and does processing of the problem domain.
The DataModel abstracts expensive operations such as data fetching without blocking the UI thread. It can keep data “alive” fetching it periodically from source (example: stock ticket), merge information from several sources etc.
The DataModel is completely UI independent and pretty much straightforward to unit test.

The View

The View consists of visual elements and represents the actual user interface presented to the users (buttons, windows, graphics, etc.). It also defines interaction for keyboard shortcuts and other input devices .

The View is defined declaratively in XAML by the designer (usually using a tool such as Expression Blend).
Using such a declarative model makes it to harder to represent some state that the original  View from the MVC pattern was meant to deal with – this includes dealing with multiple modes of interaction (such as “view mode” and “edit mode”) that change the visuals and behavior of the controls.

This is where we make use of WPF’s advanced data binding mechanism. In a simple scenario we can simply bind the View to the DataModel and use binding expressions to perform one-way binding for display only values or two-way binding to allow editing values in the DataModel.

In most scenarios, however, only a small subset of the application’s UI can be bounded directly to the DataModel. This can be the case when the DataModel is a pre-existing class or data schema over which the application developer has no control. The values exposed by the DataModel are likely to require some processing in order to allow binding to UI elements. There may also be several complex operations that require code implementation and do not fit into the strict declarative-only definition for a View but are too application specific to be part of the DataModel (which we might not have control over).
We may also want to save some view state such as view mode (view\edit\etc.) or item selection etc.

To bridge this gap between the declarative View and the DataModel we define the ViewModel…

The ViewModel

The ViewModel bridges between the DataModel and the View and performs all the tasks mentioned in the previous paragraph.
The terms is meant to describe a “Model of a View” which basically means that the ViewModel abstracts all the behavior logic behind a specific screen (View) in the application.
The ViewModel include converters that can transform DataModel types into View types, Commands that can be executed the the View’s control and interact with the DataModel and general behaviors that can be attached to UI elements in the View.

Summary and Next Steps

stockyscreen

The DataModel\ViewModel\View defines a simple yet powerful pattern allowing developers and designers to collaborate on building a robust, data-driver WPF UIs.

It allows separating the data layer from the view layer and the UI to support easier development of granular components that are also unit-testable.

To demonstrate how the various pattern components are developed and used we’ll be going over the development process of a stock ticker widget-like application dubbed Stocky (screenshot on the right) and see how this development pattern simplifies the creation of an otherwise quite complicated little application.

References:

kick it on DotNetKicks.com

Comments (6) from www.ekampf.com/blog/:

Tuesday, March 18, 2008 4:09:58 PM (GMT Standard Time, UTC+00:00)

In my company We developing a very big medical system with UI based on WPF.
We used a combination of the Model-View-Presenter and the DataModel-View-ViewModel introduced by Den Crevier’s.
looking forward to see your implementation.

Ran Trifon

Tuesday, March 18, 2008 4:52:31 PM (GMT Standard Time, UTC+00:00)

Well it’s pretty much the same…
The goal here is to summarize all the information into one place. Dan’s post are pretty short and straightforwards aimed at experienced developers and these post are meant to be more detailed.
I am going to post about topics he didn’t mention though…

Eran Kampf

Tuesday, March 25, 2008 3:41:31 PM (GMT Standard Time, UTC+00:00)

Nice post – I’m really looking forward to seeing where you go with this. I’ve just recently being trying to find some guidance on setting up an MVC/MVP framework in WPF. Dan’s series is great but I must admit that I really didn’t understand it all until I began my own implementation and things began to “gel”. Will be great to see another perspective on it.

Nigel Spencer

Tuesday, March 25, 2008 3:53:55 PM (GMT Standard Time, UTC+00:00)

Thanks Nigel,
Next post in the series is already available at http://www.ekampf.com/blog/2008/03/24/DevelopingARobustDataDrivenUIUsingWPFTheDataModel.aspx

Eran Kampf

Tuesday, March 25, 2008 6:38:44 PM (GMT Standard Time, UTC+00:00)

Eran:
Just wanted to say “keep up the good work”. between your work and Dan’s series of articles, I think I’m starting to get a handle on this. My one request is that I’d like to see how your DataModel interacts with the DataAccess Layer against SQL Server. Maybe just something against Northwind. I realize this might be outside the main scope but I think it would be interesting.
Sincerely,
Dale Williams

Dale Williams

Tuesday, March 25, 2008 7:34:38 PM (GMT Standard Time, UTC+00:00)

Hey Dale,
Thanks for the feedback :)
The 3rd post in the series will show a concrete DataModel example.
Since I was aiming to show how I build a Yahoo finance widget clone I was building the DataModel on that – keeping a stock data up to date (kind of like in Dan’s article).
However, once you see how the DataModel fetching is implemented it doesn’t really matter if the actual data is fetched via SOAP call, http, or a DB access so you’ll be able to implement a one-way binding to a data source of your choice.
While the current implementation only deals with one-way binding (only fetching the data from the without the ability to update data on the source) I do plan to show how to implement two-way binding and support comitting and rolling back data in future posts.
Thanks,
Eran

Eran Kampf

Tags: , , ,

Managed Quake 3 Arena

.NET, Game Development, Software Development January 27th, 2008

Now that’s pretty cool…  A .NET port of the Quake 3 Arena source code.

ManagedQuake3Screenshot

Uninstalling Previous Versions of Visual Studio 2008

.NET November 19th, 2007

Here are the instructions to follow before you install Visual Studio 2008 RTM:

  1. Go to the Control Panel and launch Add/Remove Programs
  2. Remove all instances of Visual Studio 2008/Codename Orcas products
  3. Remove any remaining supporting products in the specified order.
    • Remove “Crystal Reports for Visual Studio 2008 beta2″ (or “Crystal Reports 2007″)
    • Remove “MSDN Library for Visual Studio 2008 Beta”
    • Remove “Microsoft SQL Server Compact Edition 3.5″
    • Remove “Microsoft SQL Server Compact Edition 3.5 Design Tools”
    • Remove “Microsoft SQL Server Compact Edition 3.5 for Devices”
    • Remove “Microsoft Visual Studio Performance Collection Tools”
    • Remove “Windows Mobile 5.0 SDK R2 for Pocket PC”
    • Remove “Windows Mobile 5.0 SDK R2 for Smartphone”
    • Remove “Microsoft Visual Studio Web Authoring Component / Microsoft Web Designer Tools”
    • Remove “Microsoft Visual Studio Tools for Office Runtime 3.0″
    • Remove “Microsoft Device Emulator 3.0″
    • Remove “Microsoft Document Explorer 2008″
    • Remove “Microsoft Visual Studio Codename Orcas Remote Debugger”
    • Remove “Microsoft Visual Studio 64bit Prerequisites Beta” (64-bit platforms only)
    • Remove “Microsoft .NET Framework 3.5″
    • Remove “Microsoft .NET Compact Framework 3.5″

Now that you’re sure all the beta bits are are gone you can install the Visual Studio 2008 RTM edition of your choice…

Note that the list above are the products that were on my machine  and you might have additional products that require removal on your machine.

Update 20/11/2007:

ScottGu just published his own version of the list. When writing this post I started from the same list Scott has now made public but I updated it according to the products that were installed on my machine (removed some stuff, renamed some stuff to fit the name as it appears in beta2). So basically there shouldnt be a difference between the two…

kick it on DotNetKicks.com

Tags: ,

HEROES Happen {here}

.NET, Software Development November 7th, 2007

A new site dedicated to the launch events of Windows 2008, Visual Studio 2008 and SQL Server 2008
has been unveiled at http://www.heroeshappenhere.com/

Currently it contains some videos of Microsoft professional sharing their feelings about the launch but it’ll soon contain some more information regarding the event:

Coming soon this site will provide you the portal for all launch information, event registration, learning resources and new and fun way where you can highlight how technology has made you a Hero. You will be able to experience launch in a whole new way from interactive community tools and forums, new demonstrations and online training options, and even a never before seen surprise from Microsoft which will enable you to experience launch in a new and exciting way. Heroes Happen Here, and make sure you don’t miss out.

I guess I’ll have to stay tuned then…

Introduction to LINQ

.NET, Software Development September 17th, 2007

I’m doing a 1 hour Introduction to LINQ session at SAP tomorrow.
Below is a link to the presentations and I’d be happy to hear comments about it if anyone has any…

IntroToLINQ

kick it on DotNetKicks.com

Tags: ,