Tuesday, January 15, 2013

.NET File System and Solution Organization

The file system is the lowest level of organization possible for source code. It’s like the dirt below the foundation of a structure. Maybe not always on the forefront of an engineer’s mind, but if it’s not stable, the foundation will crack as it shifts beneath it. Implementing a standard and strategy for organizing the files impacts the stability and strength of the software built on top.

The Common State

Figure 1. Example root of Main.
Briefly, let’s look at an example source code repository root, call it Main. Figure 1 is the example root of Main. It isn't clear exactly where to start unless of course you are interested in UML – otherwise, it requires digging.

Solutions

Visual Studio solutions play at least 3 important roles:

  • A view and organization into code/projects
  • A direct view of referential integrity between projects
  • A buildable collection of projects (that can be automatically built by TeamCity or other CI tools)

In the example repository, solutions in source control are ad hoc.  Their location within the directory structure and the projects which they combine represent no clear pattern.  This makes the latter 2 of the 3 roles above very difficult to accomplish.   To take advantage of all the roles it is paramount to have enforcement of the file system and solution creation. Additionally, the file system layout should yield the composition of the solutions and reflect the architectural patterns inherent in the software itself.

Since solutions play such a big part in organizing the software, new solutions should be well thought out and should have a purpose beyond simply providing a place to view and edit code.  When too many solutions exist they need to be redefined and reduced.   Before discussing how to do that, let’s go one level lower and look at how projects and code files are organized.

Projects

Visual Studio projects serve the purpose of at minimum:

  • Defining an assembly name and namespace
  • Organizing files (classes, resources, etc.)
  • Defining required references
Figure 2. Variances in project naming.

The primary purpose for projects is the definition of an assembly name and namespace.  In many legacy repositories, not all projects are created equal and do not use a standard pattern or convention for naming. Figure 2 shows a few of the variances that exist project naming in the example repository.

Projects can be named different than the assembly since there is no enforcement built into the naming.  If the project name is the same as the assembly name it is easy to understand where code is physically in the file system.  The assembly name typically would also represent the root namespace of the code within.  This is best practice and is the default when creating new projects, but again, this is also not enforced.  As seen in Figure 3, project properties define the assembly name and the base namespace.  Consistency across these helps to keep things organized and clear.

Figure 3. Project properties define the assembly and base namespace.

There is also a file in every project called AssemblyInfo.cs that describes the assembly once built.  As seen in Figure 4, these values should match that of the project name and properties.  Although it’s beyond the scope of information here, the proper company name and copyright information should also be maintained in this file.

Figure 4. AssemblyInfo.cs should be consistent with the project name and properties.

Project References vs. Assembly References

Project references have their obvious advantages in .NET by enabling rapid refactoring, dependency searching, and navigation. Ideally, a solution that can contain all the projects for a given source repository will provide the best overall experience in .NET development and has many advantages. In some cases this cannot occur simply because scale of the source code.  In other cases, it’s not the size of source code itself, but the sheer number of projects that make having a single solution not feasible. In these situations separate solutions will be required and therefore a pattern for using project vs. assembly references will fall out from the organization of the solution itself.
Figure 5. Project references vs. internally built vs. 3rd party.

An important rule to remember is, references to code that are part of a project in the same solution should always be a project reference. This is shown with the highlighted projects in Figure 5. The blue underlined reference is an assembly reference to an internally built project that is in a common build folder. The black underlined reference is a 3rd party binary reference that is in the SharedBinaries folder.

Further Reading

The topics covered here can be further investigated on MSDN.  Below are a few resources that provide suggestions on file system layouts.  The layouts depend on the team, the software, the source control and more.  The ideas and recommendations extracted here are based on all these influences, but also with the clear intention of having a clean organized repository.


Recommendations

The recommendations come from both experiences with many source code repositories as well as MSDN where there is an examination of the pros and cons from alternative styles.  This is a template and is very open to suggestion and customization for the needs and preferences of the team.

File System
Figure 6. Proposed example root of Main.

Simplifying and separating the root of the example Main might yield the folders shown in Figure 6.  The following should be separated based on different types of content.

  • .NET source code
  • Non .NET source code
  • Database code
  • Documentation, diagrams and examples

The cleanest separation in distributed source control (such as Mercurial or Git) is to create individual repositories for each of the above.  For this example, TFS is the source repository.  Creating separate TFS projects tends to make things more difficult because of branching/merging concerns.  The use of folders at the root level can provide the same separation without the hassle of cross project/repository concerns lacking in some source control systems.

Databases
Figure 7. Contents of /Main/Databases.

In /Main/Databases all the source code, scripts, configuration, projects and solutions would exist as shown in Figure 7. Usually, this can be organized in a single solution where all the database projects are included and organized by Solution Folders. Doing so provides a clean, manageable view into the database projects. See the Solutions Section for further information on project organization within that folder.

Not addressed here are items like custom scripts and files. These can be included in the solution itself and organized within a folder in Source directory. This allows a developer to open the solution and have access to all the code that is part of the databases. Projects like that of Analysis Services may be broken out into a separate solution if needed and placed in /Main/Databases. It is common that all developers may otherwise need to install specialized software for opening those types of projects.

Documentation

In /Main/Documentation would exist all the documentation organized in a way that provides clarity of where to find things. This may include individual projects that have code (such as examples), but mostly this will be diagrams and tutorials describing the software. This folder has a bit more flexibility in the organization since it is not source code and it is not critical to the built product.

Non-.NET Languages

In /Main/PHP is only PHP code, no .NET code.  This folder is only an example of what might exist. This serves as an example for handling non-.NET code to separate it out into well-organized folders.  These should be organized in a manner that supports the platform and IDE for that language type.

Source
Figure 8. Proposed example /Main/Source.

In /Main/Source would be all the .NET code and everything necessary to build the .NET products. This folder would be sorted with sub-folders that have projects associated with a particular layer or major function. Figure 8 shows a proposed root of the example /Main/Source. Notice, the root of this directory has a few build scripts, some configuration files, but no solutions. The sub-folders contain the solution files for each of the major categories.

Figure 9. Generic sub-folder example.

Each sub-folder has the same structure, with the exception of SharedBinaries which has its own organizational pattern.  Figure 9 shows the generic pattern for the folder structure within a sub-folder. Within the Source and UnitTests folders are all the project folders with the .*proj project files only a single level deep. Each project should have a proper namespace and the folder name should reflect that pattern as seen in Figure 10. In the UnitTests folder, each test project should be named identical to the source project that it is testing (the companion project) and it should end with the .Test naming convention as seen in Figure 11.

Figure 10. Project directories in Source.
Figure 11. Test Project directories in UnitTests.

Each sub-folder in /Main/Source has a single solution.  No other solutions should be created unless a new major subcategory is defined. Usually with this distribution of projects into subcategories there will be a reasonable amount of projects in each category. Within the solution, the projects can be grouped using solution folders which will be described further in the Solutions Section. All content in a sub-folders Source or UnitTests folders should be part of the solution. Any content that is not part of the solution should be removed from source control or added to the solution appropriately. New projects will be added to existing solutions and will need to be placed into the Source or UnitTests folders based on the project type.

Solutions
Figure 12.  A proposed solution.

The solutions should properly organize and separate the projects using solution folders that accurately reflect the namespaces, in most, but not all cases this is possible. Sometimes there will be a need for a Common folder that has a mismatch of projects for example. Grouping projects by namespace enables quick navigation to the projects of interest.

Figure 12 shows a proposed solution. Notice how the Services and Repositories are broken out where the test projects are in their own Services.Test and Repositories.Test folders respectively. This simplifies the programmers view when coding by reducing the number of projects in any particular folder. Since tests are developed separately from the main source code it simplifies the overall view and navigation. Notice how in Web this is not the case. When the number of projects is reasonable, it facilitates visually to leave the test alongside its companion project.

Long Term Goal
Figure 13. A long term goal. Main.sln.

Although the source code in many repositories may be rather large, many times this is because of its current organizational patterns rather than actual lines of code and complexity of the application. After an initial effort of organizing the file system and solutions, there will be an opportunity to simplify and reduce the number of projects, clean up namespaces and perform other organizational tasks that will simplify the overall source code.  In the long term, it may be possible to have a Main solution, as shown in Figure 13, which has all the .NET projects included (excluding databases).  Obviously, an enormous amount of projects in a single solution is not appropriate. If an organization is able to achieve this goal, there are extraordinary benefits to programmers and the business in the ability to implement new features and maintain the source code.



Thursday, December 13, 2012

Messy Source Code Repositories

It goes without saying, a messy work-space impacts the ability to be organized and perform well. Everyone has messes, but in a common space organizing and maintaining that cleanliness is essential.

When and Why?

This process should be started immediately in any organization because it will save time and money increasing the pace of development. Rarely in a company is there an opportunity for a software re-write. Technical debt is often put off. The time to make the corrections is every day and then continuously (aka continuous improvement). New processes are typically put into place to manage the extensive mess. These processes potentially only add a layer on top of the existing mess and therefore are complex and generally lack patterns. This must be corrected at the foundation.

Figure 1. A hard copy mess with many similarities to unkept repositories.

Typical Findings

In a comprehensive inventory of messy source code repositories common findings include:

File System

  • Extensive irregular source code filing
  • More than a few empty or junk (new and unchanged) source code projects, solutions, and directories
  • Committed bin and obj directories and files
  • Intermingled mixed platform and language source code
  • Mixed documentation and source code
  • Inconsistent and duplicate storage of 3rd party code and binaries

Source Code
*Restricted to .NET high-level organizational patterns only.

  • Namespaces not consistent with location
  • Namespaces missing
  • Class names not consistent with file name
  • Multiple classes in the same file
  • Source code files missing from projects
  • Inconsistent assembly and project reference patterns
  • Ad hock solution creation and composition
  • Assembly name/project name mismatch

Starting Recommendation

This is just a start. A starting place must be carved out before the next layer of organization, standardization and patterns can be addressed. These recommendations target only high-level organizational patterns and standards.

Step 1

  1. Define, implement and enforce a pattern for source code storage
  2. Separate source code based on platform and language (non .Net vs. others)
  3. Separate documentation and diagrams
  4. Remove any compiled code and add ignore statements to source control for those files and directories
  5. Collapse, consolidate and organize 3rd party code and binaries
  6. Remove all unused projects, solutions, files, and folders

Step 2

  1. Define, implement and enforce a solution (.sln) pattern
  2. Install and enforce the usage of JetBrains ReSharper for all .NETdevelopers
  3. Fix project properties to match assembly name and namespace
  4. Fix namespaces to match location
  5. Fix class names that do not match the file name
  6. Define and standardize assembly reference policies
  7. Clean and standardize all AssemblyInfo.cs code files

Next Steps

After completion of the first 2 steps attention can turn to the code itself. In the next steps it’s important to standardize using patterns. This includes class and project structure, reduction in code duplication, collapsing of projects, namespace cleaning and standardization, removal of legacy or unused code blocks, removal of unused references, standardization of comments, etc. These steps are necessary steps to get to a cleaner working base. A clean working base will accelerate the integration of new code, designs, patterns, bug fixes, unit testing, and ramp-up time for new engineers.

Wednesday, November 28, 2012

Encountered a fun error: Failed to start up socket within 45000

Encountered a fun error: Failed to start up socket within 45000.

Firefox had updated itself to version 17. The Selenium version is 2.21.0.0 for the following DLLs:

  • Selenium.WebDriverBackedSelenium.dll
  • ThoughtWorks.Selenium.Core.dll
  • WebDriver.dll
  • WebDriver.Support.dll

The solution is as follows:

  1. Uninstall updated version of Firefox
  2. Download and install Version 14 of Firefox. (Other versions may also work, but this worked for me)
  3. Launch Firefox and in options make the following changes:


Monday, October 29, 2012

In Response to TRUSTe: Mobile Tracking: How it works and why it’s different

In response to TRUSTe: Mobile Tracking: How it works and why it’s different.  The article is a decent summary of the current state of mobile advertising. There are two things that I think are not well addressed in the article or in some ways misguided. First, is the concept of fingerprinting on a mobile device. Those of us involved deeply in this area understand there isn't additional entropy in a mobile print that allows us to differentiate between 2 devices with the exact same OS. In other words, a fingerprint in mobile (beyond cookies and local storage) is at best IP/UA degrading from there significantly because IP is not reliable in most cases often switching intra-session while on mobile networks and not WiFi. The second misconception (or lack of clarity) is regarding privacy and tracking. DNT does not mean to all market players “do not track” and therefore not monitor behavior or collect data. For example, Yahoo recently announced it would purposefully ignore DNT in IE10. Many companies are interpreting it as they please such as, don’t target but collect data as needed, or, do everything as normal and forward along the DNT value to let the down-stream guy handle it or interpret it as he wants. Enabling a “persistent identifier” as claimed at the end of the article as a way of providing “better privacy” is not true. Enabling a “persistent identifier” enables better tracking, less privacy, but for the companies who choose to respect DNT, it is true that it will help with severing non-targeted ads.

Tuesday, September 25, 2012

Take What You Can Get - From Form Fills

Many times I've started to fill something out online and decided, ugh, not now, or, there is way too many questions here forget it.  Now, let's just say you decide to close your pof account because really there isn't anyone attractive anyway and up pops a survey so you can tell them what you really think.  Half way through this scroller of a survey you decide, over it, just wanted to let them know it sucked and being the good citizen that you are you scroll to the bottom and click submit. What next? An error? Seriously? The error is asking you to fill out the missing questions.  HA! Okay, that will never happen.  What's the lesson here? Take what you can get, from form fills.

As a web designer, put the submit button at the bottom of the page as normal giving the persistant user their reward.  Asynchronously collect the data after each question is answered by the user. Send it to the server and cache it in the session, stuff it in a database, just get the data somewhere so it can be analyzed.  If 20% of the users fill out a third of the questions, 40% fill out half, and the other 40% complete it, then you get 26 point lift simply by changing how you process the data.  Put the most important question first and you're probably looking at a 99% answer rate for that question for anyone that decides to even touch the survey.  This is a no-brainer, and everyone should take what they can get from form fills.

Monday, September 17, 2012

The Art of Database Selection - Part 1

Selecting a database for the job is more of an art than a science.  That is, aside from the obvious misuse of a database for a particular job the selection may not have everything to do with the technology or perceived cost savings in licenses etc.   Take for example a companies decision to switch databases because of impending or increased license costs.  The transition to a new database is typically well planned and the tradeoffs of development changes required to support it are meticulously studied and documented.  What happens when something goes wrong?  Say the performance isn't at the level that was expected.  What if it's not even close?  Teams and company dollars can get swept into managing unforeseen transitional costs which can include support contracts, hardware changes, engineering investigation, testing, and additional changes all of which delay primary goals of the company. Overall, the cost may far outweigh that of staying on the original database solution and absorbing the increased cost of doing business-as-usual.  So when should a team make that leap?  This is part of the art of database selection.  The switch must yield (at minimum) significant benefits or provide a solution that is not available otherwise.

  • Solve a computational problem unable to be solved in a reasonable amount of time by the former database.  Caution here, mere performance increases can be challenging to achieve and should be approached with great reserve.
  • Provide a radical reduction in costs where the prerequisite should be existing team expertise and experience with 2 or more engineers on the new database including hardware knowledge and DBA skills: ideal hardware configuration for the job, replication, backup strategies, tuning, indexes, etc.
  • A change in use-case of the data such as archiving large datasets into an alternative cheaper solution.  In this case non-production critical data is the best candidate.
Databases are an art and sometimes it is best to stick with what you know and other times it's worth the risk.  Just be sure if you are going to take the leap it's going to be worth the possible challenges.  Sounds like life.

Thursday, February 18, 2010

Overloading Properties with Visibility Modifiers

In C#, I keep running into instances where I would like to protect code blocks or have a decision for the code execution path based on a visibility modifier. Programming against an execution sequence based on the calling code location (visibility) could prove to be extremely useful and create safer, more readable code. I can find many instances were I program a single public property to act a certain way given the "state" of the object. This is very typical in shared libraries that are used both server and client side. An object that is transported and serialized can act differently based on it's location server or client side. There are plenty of other uses to overload a method with visibility, this is just one example.

Of course there would need to have an implicit notion of "next least visible" or "matching visibility" execution path. An internal calling sequence should be directed to an "internal property" if one exists. If it does not exist, the next least visible (but greater then the current) path should be chosen. It should also be possible to specify explicitly, at the call, which visibility path is intended. For example:

myObject.ID.public = 123;
myObject.ID.internal = 123;

The execution path for a property overloaded using a visibility modifier is a static (compile time) decision. The following code block is an example of an overloaded property in C#.

/* NOTE: THIS IS NOT LEGAL C#, causes property already defined error */

public int ID
{
get
{
return data.ID;
}
internal set
{
data.ID = value;
}
set // assumed public
{
modified.ID = value;
}
}

Thursday, January 14, 2010

iPhone Sync'd with Google Calendar

Syncing the iPhone with Google is really easy and allows the organization of contacts, calendars and email (of course). By simply adding your Gmail account as Exchange Server on the iPhone, the option of syncing is available. If you use Google Voice it is really nice to have the contacts synced immediately as soon as you enter them on either the iPhone or on the web. See http://www.google.com/mobile/sync/

Wednesday, February 4, 2009