LINQ - Language Integrated Query

LINQ
The LINQ Project is a codename for a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for queries and provides class libraries to take advantage of these capabilities.

My bachelor's degree graduation project 

As a result of my graduation project in the computer engineering course I ended up with a concise document describing the idea behind LINQ. The document is available only in Portuguese so that I think it's a valuable source of information to people that know Portuguese given the fact that great material about LINQ is only available in English.

In addition to the intrinsic subjects related to the integration of the query language (SQL) into the programming language (C#), in this paper you'll also find information about the great language extensions that form the base of LINQ:

  • Generics
  • Anonymous methods
  • Iterators
  • Partial types
  • Nullable types
  • Query expressions
  • Automatically implemented properties
  • Implicitly typed local variables
  • Extension methods
  • Partial methods
  • Lambda expressions
  • Object initializers
  • Collection initializers
  • Anonymous types
  • Implicitly typed arrays
  • Expression trees

See the paper's abstract below (English/Português):

ABSTRACT

Macaferi, Leniel Braz de Oliveira. Query language integrated into the programming language. 2007. 96f. Monograph (bachelor’s degree in Computer Engineering) - Barra Mansa University Center, Barra Mansa, 2007. www.ubm.br

Data is the raw material of computation and is processed via software. Software products are generally structured in tiers, typically three, the data tier, the middle or business tier and the presentation or client tier. Each of these tiers has its own data model. These different paradigms cause the impedance mismatch problem between these three disparate models.

Instead of trying to unify at the data model level, a better approach is to unify at the level of algebraic operations that can be defined the same way over each data model. This allows us to define a single query language that can be used to query and transform any data model. All the data model need to do is to implement a small set of standard query operators, and each data model can do so in a way natural to itself.

The industry has reached a stable point in the evolution of object-oriented (OO) programming technologies. Programmers now take for granted the facilities of oriented programming languages and their features like classes, objects, methods and events. Such languages support the creation and use of higher order, functional style class libraries. The support is the result of new language extensions being developed. These extensions enable the construction of compositional application program interfaces (APIs) that have equal expressive power of query languages inside the programming language syntax. This makes it possible to implement the standard query operators. The standard query operators can be then applied to all sources of data, not just relational or XML domains.

This work aims to present and use the most important aspects of the language integrated query with special focus on the integration of the SQL query language into the C# programming language. Aspects as simplification of the way of writing queries, unification of the syntax for querying any data source, reinforcement of the connection between relational data and the object oriented world and less time spent in the software development process.

Keywords: query language, programming language, data models, SQL, C#, LINQ

RESUMO

Macaferi, Leniel Braz de Oliveira. Linguagem de pesquisa integrada à linguagem de programação. 2007. 96f. Monografia (bacharelado em Engenharia de Computação) - Centro Universitário de Barra Mansa, Barra Mansa, 2007. www.ubm.br


Dados formam a matéria prima da computação e são processados via software. Produtos de software são geralmente estruturados em camadas, tipicamente três: a camada de dados, a camada intermediária ou de lógica e a camada de apresentação ou do cliente. Cada uma destas camadas possui seu próprio modelo de dados. Estes diferentes paradigmas causam o problema da combinação mal sucedida entre estes três modelos completamente diferentes.

Ao invés de tentar unificar no nível do modelo de dados, uma melhor alternativa é unificar no nível das operações algébricas que podem ser definidas do mesmo modo sobre cada modelo de dados. Isto nos permite definir uma única linguagem de pesquisa que pode ser usada para pesquisar e transformar qualquer modelo de dados. Tudo o que os modelos de dados precisam implementar é um pequeno conjunto de operadores de pesquisa padrão, e cada modelo de dados pode fazer isto de uma maneira natural.

A indústria chegou a um ponto estável na evolução das tecnologias de programação orientada a objetos (OO). Desenvolvedores agora têm por certo as facilidades das linguagens de programação OO e seus ricos recursos iguais a classes, objetos, métodos e eventos. Tais linguagens suportam a criação e uso de bibliotecas de classe de estilo funcional de ordem maior. O suporte é o resultado das novas extensões de linguagem de programação que estão sendo desenvolvidas. Estas extensões permitem a criação de interfaces para programação de aplicativos (APIs) composicionais que possuem poderosas capacidades de pesquisa dentro da sintaxe da linguagem de programação. Isto torna viável a implementação dos operadores de pesquisa padrão. Os operadores de pesquisa padrão podem ser aplicados em todas as fontes de informação, não somente em domínios de bancos de dados relacionais ou XML.

Este trabalho visa apresentar e utilizar os aspectos mais importantes da linguagem integrada de pesquisa com foco na integração da linguagem de pesquisa SQL à linguagem de programação C#. Aspectos como a simplificação da maneira de escrever pesquisas, unificação da sintaxe para pesquisar qualquer fonte de dados, reforço da conexão entre dados relacionais e o mundo orientado a objetos e o menor tempo gasto no processo de desenvolvimento de software.

Palavras-chave: linguagem de pesquisa, linguagem de programação, modelos de dados, SQL, C#, LINQ

SUMÁRIO
1 INTRODUÇÃO 15
  1.1 Delimitação do tema 16
  1.2 Problema 16
  1.3 Enunciado das hipóteses 17
  1.4 Objetivos específicos e geral 18
  1.5 Justificativa do trabalho 18
2 FUNDAMENTAÇÃO TEÓRICA 19
  2.1 Linguagem de pesquisa 19
      2.1.1 Pesquisa 19
  2.2 Linguagem de programação 19
  2.3 Combinação mal sucedida entre as linguagens de pesquisa e de programação 20
  2.4 Programação orientada a objetos 24
      2.4.1 Classe e objeto 25
      2.4.2 Variável e tipo 25
      2.4.3 Membro 25
      2.4.4 Acessibilidade 25
      2.4.5 Método 26
      2.4.6 Parâmetro 26
      2.4.7 Troca de mensagem 26
      2.4.8 Herança 26
      2.4.9 Encapsulamento 26
      2.4.10 Abstração 27
      2.4.11 Polimorfismo 27
      2.4.12 Interface 27
      2.4.13 Delegate 27
  2.5 Banco de dados relacional 28
      2.5.1 Relação ou tabela 28
      2.5.2 Restrição 28
      2.5.3 Domínio de dado 28
      2.5.4 Chave primária 29
      2.5.5 Chave estrangeira 29
      2.5.6 Stored procedure 29
      2.5.7 View 29
      2.5.8 User defined function 30
  2.6 .NET Framework 30
      2.6.1 Principais recursos 32
      2.6.2 Arquitetura 33
      2.6.3 Infra-estrutura de linguagem comum 33
      2.6.4 Assemblies 34
      2.6.5 Metadados 35
      2.6.6 Biblioteca de classes base 35
  2.7 SQL 35
  2.8 C# 35
3 METODOLOGIA 37
  3.1 Extensões de linguagem 37
      3.1.1 Genéricos 37
      3.1.2 Métodos anônimos 38
      3.1.3 Iteradores 38
      3.1.4 Tipos parciais 40
      3.1.5 Tipos anuláveis 41
      3.1.6 Expressões de pesquisa 43
      3.1.7 Propriedades automaticamente implementadas 44
      3.1.8 Variáveis locais implicitamente tipificadas 45
      3.1.9 Métodos de extensão 46
      3.1.10 Métodos parciais 47
      3.1.11 Expressões lambda 49
      3.1.12 Inicializadores de objeto 50
      3.1.13 Inicializadores de coleção 50
      3.1.14 Tipos anônimos 51
      3.1.15 Arrays implicitamente tipificados 52
      3.1.16 Árvores de expressão 53
4 DESENVOLVIMENTO 54
  4.1 Linguagem de pesquisa integrada à linguagem de programação 54
      4.1.1 Operadores de pesquisa padrão 56
      4.1.2 Fonte de dados 61
      4.1.3 Operação de pesquisa 62
      4.1.4 Modelo de objetos 63
  4.2 Estudo de caso 64
      4.2.1 Classes do modelo de objetos 68
      4.2.2 DataContext 68
      4.2.3 Relacionamentos 69
      4.2.4 Pesquisa de dados 70
      4.2.5 Operações de insert, update e delete 71
5 CONCLUSÃO 73
  5.1 Avanços 73
  5.2 Limitações 74
  5.3 Trabalhos relacionados 74
  5.4 Trabalhos futuros 76
6 BIBLIOGRAFIA 78
ANEXOS 81

You can get a PDF copy of the full paper at:

https://drive.google.com/file/d/1nDbZXqKsE_jzxz4qB1ZOlgKu3LulSIKi/view?usp=sharing (Portuguese - Brazil)

Breadth and depth first search - part 2

Breadth and depth first search - part 1
Breadth and depth first search - part 3

As I've written in the previous post Breadth and depth first search - part 1 - I'll dive in more details and explain how to use the breadth and depth search methods. We'll execute a test case using the Romania map shown bellow, print the traveled paths in the screen, calculate the shortest path possible between two cities using Dijkstra's algorithm and print such a path in the screen.

To accomplish all the above let's start presenting the data structures used to represent the map in the C# programming language:
 

public class Node
{
  public Node(string key, object data);
  public Node(string key, object data, AdjacencyList neighbors);
 
  public virtual object Data { get; set; }
  public virtual string Key { get; }
  public virtual AdjacencyList Neighbors { get; }
  public virtual Node PathParent { get; set; }
 
  protected internal virtual void AddDirected(EdgeToNeighbor e);
  protected internal virtual void AddDirected(Node n);
  protected internal virtual void AddDirected(Node n, int cost);
} 

The Node class will be used to represent each city of the map.

Each city has its name represented by the key property and some other relevant data represented by the data property. Each city also has an adjacency list implemented by a specific class called AdjacencyList. This adjacency list represents the neighbors cities of a given city. For example, in the above map the neighbors cities of Bucharest are: Urziceni, Giurgiu, Pitesti and Fagaras.

Let's see the code of another class:
 

public class Graph
{
  public Graph();
  public Graph(NodeList nodes);
 
  public virtual int Count { get; }
  public virtual NodeList Nodes { get; }
 
  public virtual void AddDirectedEdge(Node u, Node v);
  public virtual void AddDirectedEdge(string uKey, string vKey);
  public virtual void AddDirectedEdge(Node u, Node v, int cost);
  public virtual void AddDirectedEdge(string uKey, string vKey,
int cost);
  public virtual void AddNode(Node n);
  public virtual Node AddNode(string key, object data);
  public virtual void AddUndirectedEdge(Node u, Node v);
  public virtual void AddUndirectedEdge(string uKey, string vKey);
  public virtual void AddUndirectedEdge(Node u, Node v, int cost);
  public virtual void AddUndirectedEdge(string uKey, string vKey,
int cost);
  public virtual void Clear();
  public virtual bool Contains(Node n);
  public virtual bool Contains(string key);
} 

The Graph class has a property that references a collection of nodes, that is, a collection of cities. This collection of cities is represented by the class NodeList that implements the so used interface IEnumerable.

As you can see the Graph class has methods that add directed or undirected edges to the graph. Each line that connects two cities (vertexes) in the Romania map is considered an edge.

The map above contains only undirected edges because they aren't defined just as one way paths between the cities. It's possible to go from Bucharest to Urziceni and then come back to Bucharest for example. So it's a two way path.

Above each line in the map is a value that represents the path cost between two cities. Let's consider this cost as the distance in miles between the cities. The path cost could be any other variable, for example, the time spent to traverse the distance (edge). The cost variable can vary according to the problem.

I implemented a class called Pathfinding as follows:
 

class Pathfinding
{
  private static Graph graph = new Graph();
 
  ...
 
  public static void BreadthFirstSearch(Node start, Node end)
  {
    ...
  }
  public static void DepthFirstSearch(Node start, Node end)
  {
    ...
  }

  ...
}

This class has additional properties and methods as ShortestPath and PrintPath. I won't spend time explaining its additional methods because they are already well explained in Part 5: From Trees to Graphs (article by Scott Mitchell). So, let's run a test case. For this we need to fill the graph with the Romania map data.
 

class Program
{
  static void Main(string[] args)
  {
    Pathfinding pathFinding = new Pathfinding();
 
    Node start, end;
 
    // Vertexes
    pathFinding.Graph.AddNode("Arad", null);
    pathFinding.Graph.AddNode("Bucharest", null);
    pathFinding.Graph.AddNode("Craiova", null);
    pathFinding.Graph.AddNode("Dobreta", null);
    pathFinding.Graph.AddNode("Eforie", null);
    pathFinding.Graph.AddNode("Fagaras", null);
    pathFinding.Graph.AddNode("Giurgiu", null);
    pathFinding.Graph.AddNode("Hirsova", null);
    pathFinding.Graph.AddNode("Iasi", null);
    pathFinding.Graph.AddNode("Lugoj", null);
    pathFinding.Graph.AddNode("Mehadia", null);
    pathFinding.Graph.AddNode("Neamt", null);
    pathFinding.Graph.AddNode("Oradea", null);
    pathFinding.Graph.AddNode("Pitesti", null);
    pathFinding.Graph.AddNode("Rimnicu Vilcea", null);
    pathFinding.Graph.AddNode("Sibiu", null);
    pathFinding.Graph.AddNode("Timisoara", null);
    pathFinding.Graph.AddNode("Urziceni", null);
    pathFinding.Graph.AddNode("Vaslui", null);
    pathFinding.Graph.AddNode("Zerind", null);
 
    // Edges
 
    // Arad <-> Zerind
    pathFinding.Graph.AddUndirectedEdge("Arad", "Zerind", 75);
    // Arad <-> Timisoara
    pathFinding.Graph.AddUndirectedEdge("Arad", "Timisoara", 118);
    // Arad <-> Sibiu
    pathFinding.Graph.AddUndirectedEdge("Arad", "Sibiu", 140);
 
    // Bucharest <-> Urziceni
    pathFinding.Graph.AddUndirectedEdge("Bucharest", "Urziceni",
85);
    // Bucharest <-> Giurgiu
    pathFinding.Graph.AddUndirectedEdge("Bucharest", "Giurgiu",
90);
    // Bucharest <-> Pitesti
    pathFinding.Graph.AddUndirectedEdge("Bucharest", "Pitesti",
101);
    // Bucharest <-> Fagaras
    pathFinding.Graph.AddUndirectedEdge("Bucharest", "Fagaras",
211);
 
    // Craiova <-> Dobreta
    pathFinding.Graph.AddUndirectedEdge("Craiova", "Dobreta",
120);
    // Craiova <-> Pitesti
    pathFinding.Graph.AddUndirectedEdge("Craiova", "Pitesti",
138);
    // Craiova <-> Rimnicu Vilcea
    pathFinding.Graph.AddUndirectedEdge("Craiova",
"Rimnicu Vilcea", 146);
 
    // Dobreta <-> Mehadia
    pathFinding.Graph.AddUndirectedEdge("Dobreta", "Mehadia", 75);
 
    // Eforie <-> Hirsova
    pathFinding.Graph.AddUndirectedEdge("Eforie", "Hirsova", 86);
 
    // Fagaras <-> Sibiu
    pathFinding.Graph.AddUndirectedEdge("Fagaras", "Sibiu", 99);
 
    // Hirsova <-> Urziceni
    pathFinding.Graph.AddUndirectedEdge("Hirsova", "Urziceni", 98);
 
    // Iasi <-> Neamt
    pathFinding.Graph.AddUndirectedEdge("Iasi", "Neamt", 87);
    // Iasi <-> Vaslui
    pathFinding.Graph.AddUndirectedEdge("Iasi", "Vaslui", 92);
 
    // Lugoj <-> Mehadia
    pathFinding.Graph.AddUndirectedEdge("Lugoj", "Mehadia", 70);
    // Lugoj <-> Timisoara
    pathFinding.Graph.AddUndirectedEdge("Lugoj", "Timisoara",
111);
 
    // Oradea <-> Zerind
    pathFinding.Graph.AddUndirectedEdge("Oradea", "Zerind", 71);
    // Oradea <-> Sibiu
    pathFinding.Graph.AddUndirectedEdge("Oradea", "Sibiu", 151);
 
    // Pitesti <-> Rimnicu Vilcea
    pathFinding.Graph.AddUndirectedEdge("Pitesti", "Rimnicu Vilcea", 97);
 
    // Rimnicu Vilcea <-> Sibiu
    pathFinding.Graph.AddUndirectedEdge("Rimnicu Vilcea", "Sibiu",
80);
 
    // Urziceni <-> Vaslui
    pathFinding.Graph.AddUndirectedEdge("Urziceni", "Vaslui",
142);
 
    start = pathFinding.Graph.Nodes["Oradea"];
 
    end = pathFinding.Graph.Nodes["Neamt"];
 
    Console.WriteLine("\nBreadth First Search algorithm");
 
    Pathfinding.BreadthFirstSearch(start, end);
 
    foreach(Node n in pathFinding.Graph.Nodes)
      n.Data = null;
 
    Console.WriteLine("\n\nDepth First Search algorithm");
 
    Pathfinding.DepthFirstSearch(start, end);
 
    Console.WriteLine("\n\nShortest path");
 
    Pathfinding.ShortestPath(start, end);
 
    pathFinding.Graph.Clear();
 
    Console.ReadKey();
  }
} 

Firstly we create a new instance of the Pathfinding class and two instances of the Node class that will reference the start and end city respectively.

The pathfinding object has a graph property that we use to store the nodes, that is, the cities of the map. To accomplish this the method AddNode of the Graph class is used.

The key that represents the node is the name of the city in this case.

After adding the cities to the graph it's time to connect the cities by means of the edges between them. For each undirected edge of the map the fourth overload of the AddUndirectedEdge method is used. The method receives as arguments the names of the edge's vertexes and the path cost.

Supposing we want to go from Oradea to Neamt, we must set the start and end node appropriately and that is done when the start and end node are assigned the values already present in the graph.

After everything is set up we can run the the breadth and depth first search methods. To start I call the the method BreadthFirstSearch already presented in the previous post Breadth and depth first search - part 1. The method receives the start and end nodes as arguments and traverses the graph according to the breadth first search algorithm. During the traversal it prints the paths in the screen so that it's easier to visually debug the code.

We use the same graph data to run the depth first search method but to avoid a wrong behavior it's necessary to set the data property of each graph's node to null. It's because such property is used to store a value indicating if that node was already visited during the path traversal of the breadth first search method.

OK. Now that the graph data is prepared we can run the depth first search method invoking the DepthSearchMethod of the Pathfinding class. This method receives the start and end nodes as arguments and traverses the graph according to the depth first search algorithm. During the traversal it prints the paths in the screen so that it's easier to visually debug the code.

The last and so important method is the ShortestPath one. The shortest path problem can be calculated through different algorithms. In this case the algorithm used (Dijkstra's algorithm) is suitable because we don't have negative costs otherwise we should use other algorithms. The ShortestPath method of the Pathfinding class receives as arguments the start and end nodes and prints in the screen the total distance of such a path and the cities travelled.

See the screenshot of the output:

Note: if you want to see a C++ implementation of the breadth and depth first search, check the third part of this series: Breadth and depth first search - part 3.

Get the complete code (Microsoft Visual C# 2008 solution) and executable of this post at:
https://github.com/leniel/leniel.net/blob/master/Uploads/BreadthDepthFirstSearchCPlusPlus.zip

To try out the code you can use the free Microsoft Visual C# 2008 Express Edition that you can get at: http://www.microsoft.com/express/vcsharp

Client server architecture

Client Server architecture
Client-Server architectureThe client server architecture is a computer architecture which separates a client computer from a server computer and is most implemented in a computer network such as the Internet. Each client or server connected to the network can be referenced as a node. The most common type of client server architecture includes only two types of nodes: client and server. This type of architecture is sometimes called 2-tier architecture. The 2-tier architecture makes possible the sharing of files and resources between devices connected to the computer network.

Client
A client is a computer that requests a service of a server. The service can be localized in other computer or in the same computer that requests the service, that is, the client computer can act as a server too.

Access to the service is available through some type of interface.

Types of clients
Common types of clients include:

  • Web browser - a web browser is a software application that enables a user to display and interact with text, images, videos, music and other information typically located on a web page at a website on the world wide web or a local network.
  • Email client - an e-mail client, also known as mail user agent (MUA) is a frontend computer program used to manage email. Sometimes, the term e-mail client is also used to refer to any agent acting as a client toward an e-mail server, independently of it being a real MUA, a relaying server, or a human typing directly on a terminal. In addition, a web application such as the Gmail from Google providing the relevant functionality is sometimes considered an email client.
  • Online chat client - an online chat client or instant messaging client is a software application that enables the user to engage in instant messaging.

Server
A server is a computer that accepts connections to serve the requests of client computers. The server sends responses to the client computers. A server is generally connected to a computer network.

Types of servers
The most common types of servers are:

  • Application server - a computer with a software engine that delivers applications to client computers or devices, typically through the Internet and using the hyper text transfer protocol (HTTP). The application server handles most, if not all, of the business logic and data access of the application.
  • Backup server - a computer responsible for safekeeping important business files, financial records, and personal music and pictures so that in case of the original files be deleted there will still exist a copy.
  • Database server - a computer responsible for the provisioning of database services to other computer programs or computers. Database servers store the database on a dedicated computer system, allow it to be accessed concurrently, maintain the integrity of the data, and handle transaction support and user authorization. A database server divides an application into a front end and a back end, in accordance with the client-server model. The front end runs on the client computer and displays the requested data. The back end runs on the server and handles tasks such as query processing, data analysis and storage.
  • DHCP server - a computer that provides the dynamic host configuration protocol (DHCP) service on the network that assigns network addresses to other devices, such as a telephone and computer.
  • DNS server - a computer that provides the domain name service (DNS) which translates the names of sites into their numerical addresses, that is, resolves the name of the website typed into the location field of the browser with the Internet protocol (IP) address of the server that will send the requested information from that website.
  • Domain server - a computer that holds all the records associated with a particular domain and answers queries about those records.
  • Email server - a computer acting as a mail transfer agent (MTA) or SMTPD (SMTP daemon) that is running the appropriate software. This computer is responsible for transferring the electronic mail messages from one computer to another.
  • File server - a computer responsible for the central storage and management of data files so that other computers on the same network can access those files.
  • Firewall server - designed to secure an internal network from threats and attacks that come from the Internet offering powerful and flexible control over all inbound and outbound network traffic.
  • FTP server - a computer that runs a program that can receive requests for a file transfer protocol (FTP) link from a client computer.
  • Logon server - a computer responsible for authenticating client computers in a domain.
  • Print server - a computer to which one or more printers are connected, which can accept print jobs from external client computers connected to the print server over a network. The print server then sends the data to the appropriate printer that it manages.
  • Proxy server - a computer system or an application program which services the requests of its clients by forwarding requests to other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource, available from a different server. The proxy server requests the service on behalf of the client between an internal network of an organization and the Internet as part of a security system to protect the organization's network of external intrusion.
  • Security server - a computer dedicated to provide security policy decisions and to enforce such decisions.
  • Terminal server - a specialized computer which aggregates multiple communication channels together. Because these channels are bidirectional, two models emerge: multiple entities connecting to a single resource, and a single entity connecting to multiple resources.
  • VPN server - a virtual private network (VPN) server is a piece of hardware or software that can act as a gateway into a whole network or a single computer. It is generally "always on" and listening for VPN clients to connect to it.
  • Web server - a computer that runs a program responsible for accepting HTTP requests from clients, such as web browsers, serving them HTTP responses along with optional data contents, which usually are web pages such as HTML documents and linked objects (images, etc.).

Note: a web service can be considered a type of server. A web service is defined by the W3C as "a software system designed to support interoperable Machine to Machine interaction over a network." Web services are frequently just Web APIs that can be accessed over a network, such as the Internet, and executed on a remote system hosting the requested services.

Conclusion
The client server architecture is extremely used nowadays. Practically all the services that are present in the Internet are based on this architecture. Even a simple visit to a news page as much as the connection to verify the grades stored in the university’s server; all these services make use of the client server architecture.

The details related to planning, equipment acquisition, installation and configuration of software and hardware are the most important when an environment based on the client server architecture is being considered.

The application type offered by the server and the type of client that will access such application are key points to end up with a concise project.

It’s known that the greater page requisitions a server has the greater will be the necessity of a powerful hardware capable of assuring the proper operation of the application even in stress conditions. Therefore it’s important not only understand the basic concepts of the client server architecture but also acquire the understanding about specific concepts related to software and hardware, which form the base of the architecture.

We are witnesses of this fantastic era in which we see the creation of new services starting up through the great opportunities that the new wave of the internet called web 2.0 brings. Such services demands better approaches to the client server architecture.

Programming technologies, new hardware devices and network apparatus, multiple multimedia capabilities, wireless connections, etc – all reinforces the propagation of the client server architecture. As a basic example, look at the biggest video site of the planet YouTube, which receives millions of video requests and that needs to provide access to its clients in a smoothly way. All this in a transparent way to its clients.

The client server architecture will subsist for a long time because it leverages the so used means of communication of actuality called internet.

Breadth and depth first search - part 1

Breadth and depth first search - part 2
Breadth and depth first search - part 3

Search algorithms
The idea behind search algorithms is to simulate the exploration of a space of states or search space through the generation of successor states already explored.

Tree based search algorithms
Tree search algorithms are the heart of searching techniques. The basic principle is that a node is taken from a data structure, its successors examined and added to the data structure. By manipulating the data structure, the tree is explored in different orders for instance level by level (breadth-first search) or reaching a leaf node first and backtracking (depth-first search).

Definitions

  • State - a representation of a physical configuration.
  • Node - a data structure part of a search tree. A node has the following properties: father, sons, depth and path cost.
States don't have father, sons, depth and path cost.

Types of problems
  • Deterministic accessible - simple states problem.
  • Deterministic inaccessible - multiple states problem.
  • Non deterministic inaccessible - contingency problem (maybe) - sensors must be used during the execution process. The solution is a tree or a policy. Many times alternates between search and execution.
  • Unknown search space - exploration problem (online).
Selecting a search space
The real world is extremely complex. The search space must be abstracted from the solution process.

Abstracting we have:
  • Abstract state - the set of real states.
  • Abstract operator - combination of real actions.
  • Abstract solution - set of real paths that represents a solution in the real world.
Each abstract action must be easier than the action in the real problem.

A simple algorithm
function GeneralSearch(problem, strategy)
  return a solution or failure
 
Initialize the search tree using the initial state of the problem
 
loop do
  if there are no candidates for expansion then
    return failure
 
Choose a leaf node for expansion according to strategy
 
if the node contains a goal state then
  return the corresponding solution
else
  expand the node and add the resulting nodes to the tree
 
end

Algorithm implementation
function GeneralSearch(problem, QueueingFunction)
nodes <- MakeQueue(MakeNode(InitialState[problem]))
loop do
  if nodes is empty then
    return failure
  node <- RemoveFront(nodes) a leaf node for expansion according
to strategy
  if GoalTest[problem] applied to State(node) succeeds then
    return node
  nodes <- QueueingFunction(nodes, Expand(node, Operators[problem]))
end

Search strategies
A search strategy is a choice made between the methods used to expand the nodes. Each method has its proper order of expansion. Strategies are evaluated according to following parameters:
  • Time complexity - the number of generated or expanded nodes.
  • Space complexity - maximum number of nodes presented in the memory.
  • Optimality - the solution found has the minimum cost?
Complexities in time and space are measured in terms of:
  • b - maximum ramification factor of the search tree.
  • d - depth of the solution that has the minimum cost.
  • m - maximum depth of the search space (can be ∞) infinite.

The two strategies presented in this post (breadth and depth first search) are considered uninformed strategies because they only use the information available in the problem definition. 

Breadth first search
The breadth first search expands the less deep node. The data structured used to implement this search algorithm is a queue. 

Breadth first search properties
Is complete? Yes, if (b is finite).

Time: 1 + b2 + b3 + … + bd = O(bd). e.g.: exponential in d

Space: O(bd) (maintains all the nodes in memory)

Optimal? Yes, if cost per step is 1; in general is not optimal.

Space is the big problem for can generate nodes at a rate of 1 MB / sec. 24 hours = 86 GB.

Depth first search
The depth first search expands the deepest node.

The data structure used to implement this search algorithm is a stack.

An important note about the depth first search is that it can execute infinite cycles so it’s mandatory to check the states already visited to avoid such infinite cycles.

Depth first search properties
Is complete? No. It is imperfect in spaces of infinite depth or in cyclic paths. Is complete only in a finite search space.

Time: O(bm). Is very bad if m is a lot greater than d. If solutions are dense can be fastest than the breadth first search.

Space: O(bm) e.g.: linear in space

Optimal? No. 

Working problem: vacation in Romania
In this post let's explore a simple states problem as shown in the following picture:

Suppose we are taking a vacation in Romania. More specifically we are in the city called Arad. We want to go to Bucharest the capital city of Romania and our flight takes off tomorrow. Breaking down our problem, we have:

  • Objective - go to Bucharest.
  • Search space - several cities to get to Bucharest.
  • Operator - drive through the cities.
In this case a solution is a sequence of cities so that the last city is Bucharest.

A valid path could be: Arad, Sibiu, Fagaras, Bucharest. e.g.: Arad -> Zerind represents a complex set of possible paths, returns, pauses for resting, etc.

To guarantee the realizability, any real state in “Arad” must take us to some real state in “Zerind”.

The problem's formulation can be:
  • Initial state - Arad
  • Operator (or successor function S(x)) - eg.: Arad -> Zerind, Arad -> Sibiu, etc.
  • Goal test - can be explicit or implicit as x = Bucharest or NoDirty(x).
  • Path cost (additive) - can be the sum of distances, used operators, etc.

The solution is a sequence of operators that takes from the initial state to the goal state. 

Implementing the breadth and depth first search in C#
The following methods use two classes implemented by Scott Mitchell [1]. The classes are: Node and EdgeToNeighbor.

I just added a new property called PathParent in the node class.
 

public static void BreadthFirstSearch(Node start, Node end)
{
  Queue<Node> queue = new Queue<Node>();
 
  queue.Enqueue(start);
 
  while(queue.Count != 0)
  {
    Node u = queue.Dequeue();
 
    // Check if node is the end
    if(u == end)
    {
      Console.Write("Path found.");
 
      break;
    }
    else
    {
      u.Data = "Visited";
 
      // Expands u's neighbors in the queue
      foreach(EdgeToNeighbor edge in u.Neighbors)
      {
        if(edge.Neighbor.Data == null)
        {
          edge.Neighbor.Data = "Visited";
 
          if(edge.Neighbor != end)
          {
            edge.Neighbor.PathParent = u;
 
            PrintPath(edge.Neighbor);
          }
          else
          {
            edge.Neighbor.PathParent = u;
 
            PrintPath(edge.Neighbor);
 
            return;
          }
 
          Console.WriteLine();
        }
        /* shows the repeated nodes
        else
        {
          Console.Write(edge.Neighbor.Key);
        } */
 
        queue.Enqueue(edge.Neighbor);
      }
    }
  }
}

public static void DepthFirstSearch(Node start, Node end)
{
  Stack<Node> stack = new Stack<Node>();
 
  stack.Push(start);
 
  while(stack.Count != 0)
  {
    Node u = stack.Pop();
 
    // Check if node is the end
    if(u == end)
    {
      Console.WriteLine("Path found");
 
      break;
    }
    else
    {
      u.Data = "Visited";
 
      // Store n's neighbors in the stack
      foreach(EdgeToNeighbor edge in u.Neighbors)
      {
        if(edge.Neighbor.Data == null)
        {
          edge.Neighbor.Data = "Visited";
 
          if(edge.Neighbor != end)
          {
            edge.Neighbor.PathParent = u;
 
            PrintPath(edge.Neighbor);
          }
          else
          {
            edge.Neighbor.PathParent = u;
 
            PrintPath(edge.Neighbor);
 
            return;
          }
 
          Console.WriteLine();
 
          stack.Push(edge.Neighbor);
        }
        /* shows the repeated nodes
        else
        {
          Console.Write(edge.Neighbor.Key);
        } */
      }
    }
  }
}

Summary
The formulation of a problem requires abstraction to avoid irrelevant details of the real world so that a search space can be defined and explored.

In a next post I'll dive in more details and explain how to use the breadth and depth search methods. We'll execute a test case using the Romania map shown in the picture above, print the traveled paths in the screen, calculate the shortest path possible between two cities using Dijkstra's algorithm and print such a path in the screen. Before that I recommend the reading of the excellent revision about data structures written by Scott Mitchell [1]. Although I only use the facts exposed in part 5 of Scott's article: From Trees to Graphs, it's worth to read starting in part 1. For now, look and think about the breadth and depth first search implementation in C#. 

References
[1] Mitchell, Scott. An Extensive Examination of Data Structures. 2004. Available at <http://msdn2.microsoft.com/en-us/library/aa289152(VS.71).aspx>. Accessed on January 8, 2008.

[2] Artificial Intelligence Depot. Blind search. Available at <http://ai-depot.com/Tutorial/PathFinding-Blind.html>. Accessed on January 8, 2008.

One Laptop per Child (OLPC) project evaluation

Evaluating and diving into the bits about the OLPC project, it's obvious the discrepancy between the configuration of modern computers and the laptop proposed. The superiority of modern computers in terms of hardware is evident. On the other hand, the OLPC's laptop aims at the students that in their day-to-day will cope with tasks not so sophisticated. This way, the OLPC’s characteristics will satisfy with praise a lot of children around the world.

OLPC logoThe key point of the OLPC project is to permit children to interact with each other through the mesh network that is available in the hardware and at the same time to enable the access to the fantastic computer network called internet.

The hardware configuration isn’t so attractive, but to those that never thought in the possibility of having a computer, it’s a valid option.

It’s important to mention that great part of the world's population is formed by people bellow the poverty line since 1.1 billion people, about a 1/6 of humanity lives with less than 1 dollar per day.

The initiative of the Massachusetts Institute of Technology (MIT) is fantastic. Since it's coming from a country that has a dictatorial economy that provides the rules to other markets nowadays and from an institute renowned internationally and financed by great technology companies, we can expect good fruits of the progress of this project.

It’s the underdeveloped governments’ responsibility to implement the policies and enforce the proper acting for the technology area, which is related directly with the future of such nations. The poverty and the barrier imposed on the knowledge access will only be suppressed when all or at least the major part of population have access to the better of this world. Unfortunately this isn’t the real thing and any project that tries to diminish the gap of the digital exclusion is welcome.

Concluding this brief evaluation I consider the OLPC an outstanding initiative of low cost given that the necessity for knowledge and access to new technologies is increasing at a high pace in the underdeveloped nations.