The New Google App Engine Blobstore API – First Thoughts

Cloud Computing, Software Development December 15th, 2009

Google’s App Engine 1.3.0 was released yesterday along with a brand new Blobstore API allowing the storage and serving of files up to 50MB.

Store and Serve – Files can be uploaded and stored as blobs, to be served later in response to user requests. Developers can build their own organizational structures and access controls on top of blobs.

The way this API works is pretty simple. To upload files you can an API that manufactures a POST URL that web forms requests containing files data are submitted to. App Engine processes the POST request and created the blobs in its storage (and BlobInfo objects – readonly datastore entities containing the metadata on each blob). It then rewrites the request, removing the uploaded files data and replacing them a Blobstore key pointing to the stored blob in the App Engine Blobstore, and calls your handler with this data.

To serve an existing blob in your app, you put a special header in the response containing the blob key. App Engine replaces the body of the response with the content of the blob.

Now this is pretty straightforward but there are few concerns with this approach:

1. What about request validation (authentication\authorization etc.)?

When uploading files, the request reaches your code only after blobs have already been processed and stored. This means that you can only handle authentication\authorization or even form validation after data has been stored.

This means you’ll have to write code to clean the relevant blob entries in case of failed authentication\authorization\validation – more datastore API calls, more CPU…

It also means that without taking care of these special cases any newbie hacker with a simple snifter (or FireBug) can start uploading (and potentially) serving files off your service (see update).

2. No way to preprocess data

As the files data is already stored prior to the program’s handler being called, there’s no way to preprocess submitted data other than reading it from the store, processing it and storing it again.

There’s also no straightforward API to access or store blob data in code, so the above process has to be implementing using URL fetching (fetch the image via http call, process it, store it again using http POST call)

There must be a way for the Google App Engine team to wrap this app nicely and provide a clean API for this to be done efficiently (along with solving the validation problem described before)

 

As the Blogstore API is still in experimental phase I guess we’ll see some quick progress made on its development and hopefully the Google team will solve the issues above.

Atleast now there’s a beginning of an alternative to Amazon S3 for AppEngine applications.

 

Update:

Bret Slatkin notes that when the API manufactures the POST URL to be used for uploading the files, it creates a unique one-time URL which which mitigates any potential sniffing.
This fits perfectly for the scenario when you’re rendering a web form to be submitted by the user. But, it makes things harder if you’re trying to provide a REST API that allows uploading files (think of something like TwitPic for example). In this case you’ll have to write your own render that simulates what a web form would do (get the files, create random POST URL, call it, …)

Tags: ,

Insight: Hiring Programmers

Software Development November 30th, 2009

crowd_standout There’s a very interesting blog post over at Raw Thought on the topic of hiring programmers. It offers the following insight on hiring:

There are three questions you have when you’re hiring a programmer (or anyone, for that matter):

  • Are they smart?
  • Can they get stuff done?
  • Can you work with them?

Someone who’s smart but doesn’t get stuff done should be your friend, not your employee. You can talk your problems over with them while they procrastinate on their actual job.

Someone who gets stuff done but isn’t smart is inefficient: non-smart people get stuff done by doing it the hard way and working with them is slow and frustrating.

Someone you can’t work with, you can’t work with.

I think its a much better and more effective approach than the traditional method of asking cheesy annoying riddles and problems…

Tags: ,

Building an iPhone Application

Software Architecture, Software Development October 29th, 2009

fiddme-teaserOn the past few weeks I’ve been working on a new venture centered around the iPhone. The process of building our app has been quite an adventure and we’ve experimented with several technologies that were new to us before reaching our current technology stack.
As we’ve finally got our stuff together and made an initial release to a group of testers I thought I’d share some of the technology choices we’ve made and the reasons behind them.

First some information about the team

…because technology choices are affected by the team’s technical skillset.

  • We’re 3 developers (Yosi, Udi and myself) and one designer (the awesome Naor Suki).
  • We’ve allocated two developers for the iPhone and one for the backend APIs & website.
  • We’re all veteran developers with experience mostly on Microsoft’s Development stack. This project meant going out of our comfort zone to a whole new set of technologies. Experience does make a difference easing the learning curve…

iPhone Development

  • iPhone SDK: This one is obvious right? We looked for alternatives for writing Objective-C. Unfortunately, Flash isn’t available for the iPhone (yet?) and MonoTouch looked promising but isn’t quite there…
    Besides its always better to be developing on the platform most developers are using which means there’s a big community that can help when you get stuck. Being on Apple’s official stack also means we get the latest features without having to wait for a 3rd party to convert them…
    To be perfectly honest, I’m not on the iPhone side of the development and did not actually write a single line of Objective-C code but I noticed it took my teammates 1-2 weeks to get the hang of it.
    To me, the fact that the iPhone App store is so successful and has so many apps which Objective-C as the development language (which is definitely harder than modern languages – Java etc.) makes Apple’s achievement even more amazing…
  • Three20: Handful of UI extractions from the Facebook iPhone app and open-sourced by the developer – joehewitt. His announcement blog post details the libraries it contains, shows some demos etc.
    The source code is also a great learning tool for how stuff is done on the iPhone.
  • json-framework: This is a pretty slick JSON parser for the iPhone. Hand parsing JSON in obj-C would not have been fun. This made it easy. I’m pretty sure I followed this tutorial to get it up and running.
  • ASIHttpRequest: A Nice Http framework that enabled easily handling asynchronous Http requests.
  • Stackoverflow is an invaluable resource for asking question and solving all sort of problems. As Yosi, who’s been concentrating on the iPhone side of our development, puts it “I dont think any iPhone development could be done without StackOverflow”
  • MGTwitterEngine: an awesome objective-c wrapper for twitter api which we based our api on.
  • Google Analytics SDK: A library enabling sending information to Google Analytics from the iPhone. This is important for measuring the ways users interact with certain flows on our program. For example, it helps us measure the conversion on our signup flow – how many users go through the signup flow and finish, and if they dont, what steps makes them go away?
    This kind of functionality is essential to measuring and improving UX flows…
  • Google Toolbox for Mac: A library for working with the different services exposed by Google.
Backend Development

After playing around with ASP.NET MVC (which we all had background with having come from the Microsoft ecosystem) and Ruby on Rails (because its cheaper to host than ASP.NET and way simple, faster, more fun to use IMHO) we’ve finally settled for Google AppEngine and django (Python).

We made the decision to base our development on django rather than on Google’s own webapp framework for the following reasons:

  • Lots of out of-of-the-box features. django has been out there for quit a while and is bundled with lots of features (like an easy to build admin interface, authentication system, validation system etc.)
  • Big community. There are lots of people doing django out there lots of open source libraries and samples available. As a rule of thumb its always better to be on the majority side…
  • Not specific to Google AppEngine. django is a standalone Python web development platform. While some parts of our code has to be AppEngine specific it would still be considerably easier to move away from AppEngine (if we ever decide to do so) than if we were entirely Google specific.

Google AppEngine also have Java support. But using Python with django is way easier and has a lot more support when it comes to both AppEngine and web development. Seriously, if you’re thinking of using Java, Don’t! take the leap and go with Python…

Libraries we’ve used:

  • app-engine-patch: This library is absolutely amazing and a must if you’re using django on AppEngine. Since the AppEngine data store API is not compatible with django’s API, a lot of the really cool time-saving features of django will simply not run on AppEngine (such as the admin UI, authentication and basically anything that requires data access). app-engine-patch loads django and patches it so it is compatible with the AppEngine API making all those cool django features work. This one is a must! You just download their project template and start developing your application on top of it.
  • PyDev: An Eclipse plugin for editing and debugging Python and AppEngine applications. It might sound obvious but I was using was actually using Notepad++ (on Windows)  for development until I found out there’s a decent IDE I could use…
  • Piston: a django library for developing REST-APIs. While its not entirely compatible with AppEngine it took a simple forking to edit those parts out…
  • GeoModel: provides basic indexing and querying of geospatial data on Google AppEngine.

Also, I would recommend taking the time to learn and understand how the AppEngine datastore works so you’ll understand how to build your datamodel to run efficiently on Google’s platform.
The following two presentations from Google I/O are invaluable:

So what do you think? If you’re developing an iPhone app, I’m very interested to know what were your technology choices and reasoning…

Oh, and if you have an iPhone and you live in Israel (its a local app so we’re limiting our efforts to Israel at the moment) please head over to our beta signup form and signup :)

Tags: , , , ,

Data Mining – Handling Missing Values the Database

Software Development August 14th, 2009

DataMiningQuestionsAnswersFigure2 I’ve recently answered Predicting missing data values in a database on StackOverflow and thought it deserved a mention on DeveloperZen.

One of the important stages of data mining is preprocessing, where we prepare the data for mining. Real-world data tends to be incomplete, noisy, and inconsistent and an important task when preprocessing the data is to fill in missing values, smooth out noise and correct inconsistencies.

If we specifically look at dealing with missing data, there are several techniques that can be used. Choosing the right technique is a choice that depends on the problem domain – the data’s domain (sales data? CRM data? …) and our goal for the data mining process.

So how can you handle missing values in your database?

1. Ignore the data row

This is usually done when the class label is missing (assuming you data mining goal is classification), or many attributes are missing from the row (not just one). However you’ll obviously get poor performance if the percentage of such rows is high.

For example, lets say we have a database of students enrollment data (age, SAT score, state of residence, etc.) and a column classifying their success in college to “Low”, “Medium” and “High”. Lets say our goal is do build a model predicting a student’s success in college. Data rows who are missing the success column are not useful in predicting success so they could very well be ignored and removed before running the algorithm.

2. Use a global constant to fill in for missing values

Decide on a new global constant value, like "unknown", "N/A" or minus infinity, that will be used to fill all the missing values.
This technique is used because sometimes it just doesn’t make sense to try and predict the missing value.

For example, lets look at the students enrollment database again. Assuming the state of residence attribute data is missing for some students. Filling it up with some state doesn’t really makes sense as opposed to using something like “N/A”.

3. Use attribute mean

Replace missing values of an attribute with the mean (or median if its discrete) value for that attribute in the database.

For example, in a database of US family incomes, if the average income of a US family is X you can use that value to replace missing income values.

4. Use attribute mean for all samples belonging to the same class

Instead of using the mean (or median) of a certain attribute calculated by looking at all the rows in a database, we can limit the calculations to the relevant class to make the value more relevant to the row we’re looking at.

Lets say you have a cars pricing database that, among other things, classifies cars to "Luxury" and "Low budget" and you’re dealing with missing values in the cost field. Replacing missing cost of a luxury car with the average cost of all luxury cars is probably more accurate then the value you’d get if you factor in the low budget cars.

5. Use a data mining algorithm to predict the most probable value

The value can be determined using regression, inference based tools using Baysian formalism , decision trees, clustering algorithms (K-Mean\Median etc.).

For example, we could use a clustering algorithms to create clusters of rows which will then be used for calculating an attribute mean or median as specified in technique #3.
Another example could be using a decision tree to try and predict the probable value in the missing attribute, according to other attributes in the data.

I’d suggest looking into regression and decision trees first (ID3 tree generation) as they’re relatively easy and there are plenty of examples on the net…

Additional Notes
  • Note that methods 2-5 bias the data as the filled-in value may not be correct.
  • Method 5 uses the most information available in the present data to predict the missing value so it has a better chance for generating less bias.
  • Missing value may not necessarily imply an error in the data! forms may contain optional fields, certain attributes may be in the database for future use.

Tags: , ,

Facebook, Hadoop and Hive

Cloud Computing, Software Architecture, Software Development June 16th, 2009

facebook logo for website Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run distributed applications that process vast amounts of data), Yahoo being the first. It is also the creator of Hive, a data warehouse infrastructure built on top of Hadoop.

The following two posts shed some more light on why Facebook chose the Hadoop\Hive path, how they’re doing it and the challenges they’re facing:

Facebook, Hadoop, and Hive on DBMS2 by Curt Monash discusses Facebook’s architecture and motivation.

Facebook decided in 2007 to move what was then a 15 terabyte big-DBMS-vendor data warehouse to Hadoop — augmented by Hive — rather than to an MPP data warehouse DBMS…

The daily pipeline took more than 24 hours to process. Although aware that its big-DBMS-vendor warehouse could probably be tuned much better, Facebook didn’t see that as a path to growing its warehouse more than 100-fold.

Hive – A Petabyte Scale Data Warehouse using Hadoop by Ashish Thusoo from the Data Infrastructure team at Facebook discusses Facebook’s Hive implementation in details.

… using Hadoop was not easy for end users, specially for the ones who were not familiar with map/reduce. End users had to write map/reduce programs for simple tasks like getting raw counts or averages. Hadoop lacked the expressibility of popular query languages like SQL and as a result users ended up spending hours (if not days) to write programs for typical analysis. It was very clear to us that in order to really empower the company to analyze this data more productively, we had to improve the query capabilities of Hadoop. Bringing this data closer to users is what inspired us to build Hive. Our vision was to bring the familiar concepts of tables, columns, partitions and a subset of SQL to the unstructured world of Hadoop, while still maintaining the extensibility and flexibility that Hadoop enjoyed.

Tags: , , , ,

Introduction to MapReduce for .NET Developers

.NET, Software Development May 6th, 2009

The basic model for MapReduce derives from the map and reduce concept in functional languages like Lisp.
In Lisp, a map takes as input a function and a sequence of values and applies the function to each value in the sequence.
A reduce takes as input a sequence of elements and combines all the elements using a binary operation (for example, it can use “+” to sum all the elements in the sequence).

MapReduce, inspired by these concepts, was developed as a method for writing processing algorithms for large amounts of raw data. The amount of data is so large that it can’t be stored on a single machine and must be distributed across many machines in order to be processed in a reasonable time.
In systems with such data distribution, the traditional central processing algorithms are useless as just getting the data to the centralized CPU running the algorithm implies huge network costs and months (!) spent on transferring data from the distributed machines.
Therefore, processing such massive scales of distributed data implies the need for parallel computing allowing us to run the required computation “close” to where the data is located.
MapReduce is an abstraction that allows engineers to write such processing algorithms in a way that is easy to parallelize while hiding the complexities of parallelization, data distribution, fault tolerance etc.

This value proposition for MapReduce is outlined in a Google research paper on the topic:

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day.

The MapReduce Programming Model

As explained earlier, the purpose of MapReduce is to abstract parallel algorithms into a map and reduce functions that can then be executed on a large  scale distributed system.
In order to understand this concept better lets look at a concrete map reduce example – consider the problem of counting the number of occurrences of each word in a large collection of documents:

map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
  EmitIntermediate(w, "1"); 

reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
  result += ParseInt(v);
Emit(AsString(result));

The map function goes over the document text and emits each word with an associated value of “1”.

The reduce functions sums together all the values for each word producing the number of occurrences for that word as a result.

First we go through the mapping phase where we go over the input data and create intermediate values as follows:

  • Records from the data source (lines out of files, rows of a database, etc.) are fed into the map function as <key,value> pairs.For example: <filename, file content>
  • The map function produces one or more intermediate values along with an output key from the input

After the mapping phase is over, we go through the reduce phase to process the intermediate values:

  • After the map phase is over, all the intermediate values for a given output key are combined together into a list and fed to the reduce function.
  • The reduce function combines those intermediate values into one or more final values for that same output key

Notice that both the map and the reduce functions run on independent set of input data. Each run of the map function process its own data source and each run of the reduce function processes the values of a different intermediate key.

Therefore both phases can be parallelized with the only bottleneck being the fact that the map phase has to finish for the reduce phase to start.

The underlying system running these method is in takes care of:

  • Initialize a set of workers that can run tasks – map or reduce functions.
  • Take the input data (in our case, lots of document filenames) and send them to the workers to map
  • Streamline values emitted by map function to the worker (or workers) doing the reduce. Note that we don’t have to wait for a certain map run to finish going over the entire file in order to start sending its emitted values to the reducer, so that the system can prepare the data for the reducer while the map function is running
    (In Hadoop – send the map values to the reducer node and andle grouping by key).
  • Handle errors – support a reliable, fault tolerant process as workers may fail, network can crush preventing workers from communicating results, etc.
  • Provides status and monitoring tools.

A Naive Implementation in C#

Lets see how we can build naive MapReduce implementation in C#.

First, we define a generic class to manage our Map-Reduce process:

public class NaiveMapReduceProgram<K1, V1, K2, V2, V3>

The generic types are used the following way:

  • (K1, V1) – key-value types for the input data
  • (K2, V2) – key value types for the intermediate results (results of our Map function)
  • V3 – The type of the result for the entire Map-Reduce process

Next, we’ll define the delegates of our Map and Reduce functions:

public delegate IEnumerable<KeyValuePair<K2, V2>>   MapFunction(K1 key, V1 value);
public delegate IEnumerable<V3>                     ReduceFunction(K2 key, IEnumerable<V2> values);
private MapFunction _map;
private ReduceFunction _reduce;
public NaiveMapReduceProgram(MapFunction mapFunction, ReduceFunction reduceFunction)
{
    _map = mapFunction;
    _reduce = reduceFunction;
}

(Yes, I realize I could use .NET’s Func<T1,T2,TResult> instead but that would just result in horribly long ugly code…)

Now for the actual program execution. The execution flow is as follows: We take the input values, pass them through the map function to get intermediate values, we group those values by key and pass them to the reduce function to get result values.

So first, lets look at the mapping step:

private IEnumerable<KeyValuePair<K2, V2>> Map(IEnumerable<KeyValuePair<K1, V1>> input)
{
    var q = from pair in input
            from mapped in _map(pair.Key, pair.Value)
            select mapped;

    return q;
}

Now after we got the mapped intermediate values we want to reduce them. The Reduce function expects a key and all its mapped values as input so to do that efficiently we want to group the intermediate values by key first and then call the Reduce function for each key.

The output of this process is a V3 value for each of the intermediate K2 keys:

private IEnumerable<KeyValuePair<K2, V3>> Reduce(IEnumerable<KeyValuePair<K2, V2>> intermediateValues)
{
    // First, group intermediate values by key
    var groups = from pair in intermediateValues
                 group pair.Value by pair.Key into g
                 select g;

    // Reduce on each group
    var reduced = from g in groups
                  let k2 = g.Key
                  from reducedValue in _reduce(k2, g)
                  select new KeyValuePair<K2, V3>(k2, reducedValue);

    return reduced;
}

Now that we have the steps code the execution itself is simply defined as Reduce(Map(input)) :

public IEnumerable<KeyValuePair<K2, V3>> Execute(IEnumerable<KeyValuePair<K1, V1>> input)
{
    return Reduce(Map(input));
}

The full source code and tests can be downloaded from here:

Map-Reduce Word Counting Sample – Revisited

Lets go back to the word-counting pseudo code and write it in C#.

The following Map function gets a key and a text value and emits a <word, 1> key-pair for each word in the text:

public IList<KeyValuePair<string, int>> MapFromMem(string key, string value)
{
    List<KeyValuePair<string, int>> result = new List<KeyValuePair<string, int>>();
    foreach (var word in value.Split(' '))
    {
        result.Add(new KeyValuePair<string, int>(word, 1));
    }
    return result;
}

Having calculated a <word, 1> key-pair for each input source, we can group the results by the word and then our Reduce function can sum the values (which are 1 in this case) for each word:

public IEnumerable<int> Reduce(string key, IEnumerable<int> values)
{
    int sum = 0;
    foreach (int value in values)
    {
        sum += value;
    }

    return new int[1] { sum };
}

Our program code looks like this:

MapReduceProgram<string, string, string, int, int> master = new MapReduceProgram<string, string, string, int, int>(MapFromMem, Reduce);
var result = master.Execute(inputData).ToDictionary(key => key.Key, v => v.Value);

The result dictionary contains a <word, number-of-occurrences> pairs.

Other Examples

Distributed LINQ Queries. One of POCs I’m working on using the above naive, LINQ-based implementation, is running a distributed LINQ query. Imagine you have a system where raw data is distributed across several SQL Servers. We can have our map function run a LINQ-to-SQL query on multiple DataContexts in parallel (the value input for the map function – V1 – can be a DataContext) and then reduce it to a single result set. This is probably a naive\simplified implementation of what the guys at Microsoft’s Dryad team are doing.

Count URL Visits. Consider you have several web servers and you want to produce the amount of visits for each page on your site. You can produce pretty much the same way the word-counting example works. The map function parses a log file and produce a <URL, 1> intermediate value. The reduce function then sums the values for each URL and emits <URL, number of visits>

Distributed Grep. You can run a grep search on a large amount of files by having the map function emits a line if it matches a given pattern. The reduce function in this case is just an identity function that copies the supplied intermediate data to the output.

Map-Reduce in the Real World

The real complexity and sophistication in MapReduce is in the underlying system takes care of running and managing the execution of MapReduce jobs. Real world MapReduce implementations, like Google’s system, Hadoop or Dryad have to go beyond the naive implementation shown here and take care of things like resource monitoring, reliability and fault tolerance (for example, handle cases where nodes running map\reduce jobs crush, or go offline due to network problems).

The following resources are worth checking out:

Tags: , , , , ,

Developing a Robust Data Driven UI Using WPF – An Overdue Summary (and full source code)

.NET, Software Development, WPF April 15th, 2009

I wrote the stocky application more than a year ago as a research project aimed at proving that using WPF we can separate presentation metadata (XAML) from program logic. The goal was to provide the Duet team at SAP with a document reference sample for using M-V-VM to achieve this separation.

I started documenting the proof-of-concept in a series of posts but unfortunately after leaving SAP my interests (and work) shifted away from WPF and I didn’t find the time to finish the series.

I’ve received numerous requests to release the source code but I couldn’t do so because it was part of a larger infrastructure code I wrote at SAP which basically ads a lot of noise to the sample (an d probably ads legal issues for me sharing it).
Anyway, I took some time off this afternoon to re-write the sample independently so that I could share it:

It can be found on my SkyDrive

This, I guess is the long overdue ending for the series:

  • Introduction – introduces the concept of M-V-VM and the reasoning behind it.
  • The DataModel – describes how to write the Model part of our application.
  • Stock DataModel Sample – provides a conrete implementation of a Stock model and its view.. 

However, If you’re interested in M-V-VM in WPF, there are numerous topics worth mentioning that I didn’t get to cover and are definitely worth checking out:

Unit Testing

As I said in the introduction post, one of the most important benefits of seperating the logic code from the presentation (XAML) is that its straightforward to unit test. In fact, my next post following the Stock DataModel Sample was going to be about unit testing – specifically, how to test the DataModel its provider which, because of the use of threading, is a bit tricky.

This post is actually 99% done in the comments of the unit test code that’s in DefaultStockQuoteProviderTest.cs in the  provided source code. So do yourself a favor and go over the code. It’s not long and very well documented…

Using Lambda Expression for DataBinding

Data-binding is pretty much at the heart of the M-V-VM concept and it makes us write Value Converters which is pretty tedious and annoying.
Wouldn’t it be great if we could replace writing lots of IValueConverter classes like this:

<TextBlock Foreground="{Binding Change, Converter={StaticResource StockForegroundConverter}}" … />

[ValueConversion(typeof(double), typeof(Brush))]
public class StockChangeToBrushConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
    {
        double change = (double)value;
        if (change == 0) return Brushes.Black;
        return (change < 0) ? Brushes.DarkRed : Brushes.Green;
    }

    public object ConvertBack(object value, Type targetType, object parameter, CultureInfo culture)
    {
        return double.NaN;
    }
}

To just the following XAML statement that embeds the conversion logic:

<TextBlock Foreground="{Binding Change,

Converter={ change=> if (change == 0) return Brushes.Black; return (change < 0) ? Brushes.DarkRed : Brushes.Green; }}" … />

M. Orçun Topdağı wrote an excellent series on using Lambda Expressions for data-binding in WPF to achieve just that:

Reference Applications and Guidance

I haven’t seen a lot of sample WPF LOB reference applications out there but here are some interesting links for further learning:

Tags: , , , ,

ASP.NET MVC RSS Feed Action Result

Software Development January 11th, 2009

Guy wrote a post about rendering an RSS feed on ASP.NET MVC using custom feed model classes and a view that renders the feed XML.

There’s a better (shorter) way for achieving the same result while leveraging on the Syndication mechanism built into .NET’s WCF.
WCF exposes the SyndicationFeed, SyndicationItem, SyndicationPerson classes which represent our data model.
In order to render this model WCF also exposes the Atom10FeedFormatter, and RSS20FeedFormatter classes that can render the feed to a stream, so all we need to do is integrate that into the ASP.NET MVC pipeline.

The ASP.NET MVC framework introduces a concept of returning an ActionResult instance as the result of Controller Actions.
This ActionResult object indicates the result from an action (a view to render, a URL to redirect to, another action/route to execute, etc).

ASP.NET MVC ships with several Action Results:

  • ContentResult – Simply writes the returned data to the response.
  • EmptyResult – Returns an empty response.
  • HttpUnauthorizedResult – Returns Http 401 code for non authorized access.
  • JsonResult – Serializes the response to Json.
  • RedirectResult – Redirects to another Url.
  • RedirectToRouteResult – Redirects to another controller action.
  • ViewResultBase (abstract) – Renders an HTML content as a result.
    • PartialViewResult (inherits from ViewResultBase) – Renders a partial HTML response.
  • BinaryResult (abstract) – Returns a binary response.
    • BinaryStreamResult (inherits from BinaryResult) – Writes a binary stream as a result.

So basically, to return a feed result all we need to do is define our own ActionResult implementation by deriving from ActionResult:

public abstract class ActionResult
{
    protected ActionResult();

    public abstract void ExecuteResult(ControllerContext context);
}

All we need to do is override the ExecuteResult method and write our data model to the output http stream using RSS20FeedFormatter:

public class RssActionResult : ActionResult
{
    public SyndicationFeed Feed { get; set; }

    public override void ExecuteResult(ControllerContext context)
    {
        context.HttpContext.Response.ContentType = "application/rss+xml";

        Rss20FeedFormatter rssFormatter = new Rss20FeedFormatter(Feed);
        using (XmlWriter writer = XmlWriter.Create(context.HttpContext.Response.Output))
        {
            rssFormatter.WriteTo(writer);
        }
    }
}

Now we can simply return RssActionResult as a result of our controller’s action.

Here’s a simple example:

public ActionResult Feed()
{
    SyndicationFeed feed =
        new SyndicationFeed("Test Feed",
                            "This is a test feed",
                            new Uri("http://Contoso/testfeed"),
                            "TestFeedID",
                            DateTime.Now);

    SyndicationItem item =
        new SyndicationItem("Test Item",
                            "This is the content for Test Item",
                            new Uri("http://Contoso/ItemOne"),
                            "TestItemID",
                            DateTime.Now);

    List<SyndicationItem> items = new List<SyndicationItem>();
    items.Add(item);
    feed.Items = items;

    return new RssActionResult() { Feed = feed };
}

… and that’s it!

A more elegant solution that leverages existing framework capabilities.

Related Posts

Tags: , ,

99 Ways to Become a Better Developer

Software Development, Tips December 5th, 2008

I encountered this post on my weekend reading. 91 Surefire Ways to Become an Event Greater Developer contain a comprehensive guide linking to all sort of blog posts providing insights on improving your skills as a developer.

While the list is very long and sometimes debatable it does have some interesting pointers. If you do nothing else, delve into item #8: Learn Programming by Not Programming referring to the following post by Jeff Atwood.

The topic in question is why some developers outperform their peers regardless of their accumulated experience:

But the dirty little secret of the software development industry is that this is also true even for people who can program: there’s a vast divide between good developers and mediocre developers.

A mediocre developer can program his or her heart out for four years, but that won’t magically transform them into a good developer. And the good developers always seem to have a natural knack for the stuff from the very beginning.

The answer lies in the quotes taken from Bill Gates remarks:

“The nature of these jobs is not just closing your door and doing coding, and it’s easy to get that fact out. The greatest missing skill is somebody who’s both good at understanding the engineering and who has good relationships with the hard-core engineers, and bridges that to working with the customers and the marketing and things like that.”

Eric Sink makes the distinction even clearer in You Need Developers, Not Programmers drawing a distinction between Programmers who are only excited about writing code and basically only care about doing that, and Developers who contribute to the software product in many ways.

The Great Programmer\Hacker Stereotype

You all know that guy (hell, most of us were that guy when we just started out, I know I was) – he has great technical skills, likes writing code and can spend hours within his IDE writing code that’ll make most of us scratch our head. Yet, he views the world only in one dimension – code. Business? that’s for the managers to figure out. Sales\Marketing? annoyances for others to take care of. Documentation? but the code is so obvious…Builds? Deployment? Configuration? …

Passion for code is a great quality. But as a specialist its all too easily digging yourself deeper and deeper into a skill you’ve already proven yourself to be capable at when you’d be better of using the time to cultivate other skills that are part of the process of making software – rendering yourself obsolete over time…

The great hacker is a one trick pony – he writes great code but that’s about it…
Most of these guys end up working alone as consultants or freelancers where they don t have to care about that other stuff, or they end up as programmers at some big firms where there’s more room for specialists doing specific jobs (Architects to architecture, PMs do project management, Programmers code…).
On the other hand, those who truly like making software, open up to the other aspects of software development.
When that change in mindset happens, that’s when you can truly grow exponentially…

So what do I do?

Ok, I guess you got the point… But how do you get started?  Here are my own 5 cents on the topic…

Read, Read and Read Some More…

We’re in an industry that is moving forward at a fast pace. Technology becomes obsolete every year and a half or so and as developers we have to constantly struggle to keep up. Books are not only great to help you keep up but also to expand your knowledge to other fields.
There are plenty of interesting books and blogs about, well, pretty much everything.
Here are some recommendations to get you started:

The Inmates Are
Running the Asylum

The Pragmatic Programmer

Made to Stick

Crossing the Chasm

The Innovator’s Dilemma

Eric Sink on the
Business of Software


(most of it is
available online here)

Oh and one word about programming books: the best ones are timeless, transcending choice of language, IDE and platform.
I try to stay away from them thick, heavy, language\platform specific references – most of them go out of date after a year or so anyway and most of the information there could be easily obtained elsewhere (online – Google, the product’s docs, blogs…)

Most programming big are just a waste of your time (and money…)

Contribute to an Open Source Project

Back in the days of Delphi I was involved in Project JEDI dedicated to exposing different APIs (especially the Win32 API) to Delphi developers.
I learned a lot working with the JEDI code base, documentation, samples and other team members.
Later when it was time to get drafted to the Israeli Army (we all have to do it at 18 here) the experience, credit and code samples help me land a (very) exclusive position as a programmer. Who knows where I’d be today if I didn’t qualify and had to serve as a combatant…

Contributing to an open source project is a great way to gain experience, learn and get better.
There are no job interviews to pass, degree requirements or commitment to working hours or schedule required – you can just join in and start submitting patches or contribute in ways other than code (submit bugs, docs, support, …).

You can learn a lot just from studying the code and interacting with your peers…

Contributing to open source shows dedication and passion – its a walking talking resume.

Get a mentor

Find yourself a mentor or mentors who can teach you about different aspects of the business. I’ve had several at SAP and talking with them proved to be an invaluable asset (If you’re reading, thanks! :) )

It doesn’t have to be official mentoring which is part of the person’s goals or job description. Many of your peers are experts in their field and they’ll be happy to show you around if you just show some interest…

Become a Mentor

Great developer are eager to learn… and teach. Can you pass you passion and knowledge to others?

You can also…

  • Open a blog about your experience, opinions, etc.
  • Start answering questions at stackoverflow.com and collect achievements

Land an Internship

Try getting an internship in a different role. When I was in SAP they had a special program allowing employees to apply for a ~6 month position somewhere within the company. The reason behind it was to get employees familiar with different aspects of the company. Maybe product management, marketing or sales in not really your first choice of profession but why not try it for a couple of month without the risk of going through a career change? How cool is that? I’m sure many large corporations has something similar and even if not, it can’t hurt if you come up with such an interesting offer to your boss…

Own a Product Area

Get ownership on some part of the product your team is working on. Weather a specific component or a vertical (like Security) you should be in charge of getting it done – from getting the definition done with the product\sales\business team, through UX, development, QA, etc…
There’s nothing better than learning about the process of software development through experiencing the entire cycle…

Innovate

Start something new. When working on Duet we’ve had many issues getting the thing deployed. So I made a tool for (myself mainly) our QA and RIG (regional implementation group – the guys who work with customers) to help diagnose problems. This later became the official Duet Support Tool and got its own dedicated development time. Is your product, development environment perfect? I’m sure not… find a need a feel the gap…

Why? If by owning a product area you learned about the entire development cycle, here you’ll learn about defining and “selling” to the team…

Bonus Reading…

Another link worth visiting is the one about the Metrosexual Developer. Funny and true… ;)

Related:

Tags: , ,

The Dark Side of LINQ

.NET August 5th, 2008

I’ve been having mixed feeling for quite some time now regarding LINQ.
Sure it can make working with data sources a lot easier and it can definately save a lot of code…
But, what happens with the following C# foreach statement

List<KeyValuePair<string, string>> resultList = new List<KeyValuePair<string, string>>();
string[] paramsArray = parameters.Split(new char[] { '&' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string p in paramsArray)
{
    int index = p.IndexOf('=');
    if (index > 0)
    {
        string key = p.Substring(0, index);
        string value = p.Substring(index + 1);
        resultList.Add(new KeyValuePair<string, string>(key, value));
    }
}

IEnumerable<KeyValuePair<string, string>> result =
    resultList.Distinct((p1, p2) => p1.Key == p2.Key);

Turns to this query:

var distinctPairs = (from keyValuePair in parameters.Split(new char[] { '&' }, StringSplitOptions.RemoveEmptyEntries)
                     let index = keyValuePair.IndexOf('=')
                     where index != -1
                     let key = keyValuePair.Substring(0, index)
                     where !string.IsNullOrEmpty(key)
                     let valueText = keyValuePair.Substring(index + 1)
                     select new { Key = key, ValueText = valueText })
                             .Distinct( (p1, p2) => (p1.Key == p2.Key) )
                             .ToArray();

I don’t know about you but I find the first version a lot more approachable, readable and quicker to understand. The same code in LINQ is not shorter and looks simply looks Evil.

LINQ is like the force… It can be used to wonderful code that is simple and functional, but it also has the potential of producing cryptic code that’s hard to maintain.

Use it wisely and don’t be tempted for its dark side…

Tags: