Tech

Guides
 

Easily parse string values with .NET

By Tony Patton, TechRepublic
Wednesday, January 25, 2006 03:03 PM
The .NET Framework simplifies processing and formatting data with the String class and its Split and Join methods or regular expressions. Learn more about using these methods in your application.
Processing string values is an integral aspect of most application development projects. This often involves parsing strings into separate values. For instance, receiving data from an external data source such as a spreadsheet often utilizes a common format like comma-separated values. The .NET String class simplifies the process of extracting the individual values between the commas.

Extracting values
The Split method of the String class allows you to extract individual values separated by a specific character. The separator value is passed to the method, which is overloaded with its second variation accepting a second parameter that specifies the maximum number of elements to return (extract from the string value). (Note: You can specify more than one separator in a character array.) The values pulled from the string are returned in a String array.

Here are the two variables:

  • String.Split(char[]) in C# or String.Split(Char()) in VB.NET
  • String.Split(char[], int) in C# or String.Split(Char(), Integer) in VB.NET

The following C# snippet populates an array with values contained in a comma-separated string value:

string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";

string[] sites = values.Split(',');
foreach (string s in sites) {
Console.WriteLine(s);

}

The following output is generated:

TechRepublic.com

CNET.com

News.com

Builder.com

GameSpot.com

The equivalent VB.NET code follows:

Dim values As String

values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"

Dim sites As String() = Nothing

sites = values.Split(",")

Dim s As String

For Each s In sites
Console.WriteLine(s)

Next s

You may specify multiple separator characters, which are contained in a character array. The following code splits a string of values separated by a comma, semicolon, or colon. In addition, it uses the optional second parameter to set the maximum number of items returned at four.

char[] sep = new char[3];

sep[0] = ',';

sep[1] = ':';

sep[2] = ';';

string values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com";

string[] sites = values.Split(sep, 4);
foreach (string s in sites) {
Console.WriteLine(s);

}

The following output is generated (notice that the second parameter places the remainder of the string in the last array element):

TechRepublic.com

CNET.com

News.com

Builder.com; GameSpot.com

The equivalent VB.NET code follows:

Dim values As String

values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com"

Dim sites As String() = Nothing

Dim sep(3) As Char

sep(0) = ","

sep(1) = ":"

sep(2) = ";"

sites = values.Split(sep, 4)

Dim s As String

For Each s In sites
Console.WriteLine(s)

Next s

While the Split method allows you to easily work with individual elements contained in a string value, you may need to format values according to a predefined format like comma-separated values. The String class makes it easy to assemble a properly formatted string.

Putting it together
The Join method of the String class accepts the character to be used as the separator as its first parameter. The values to be concatenated are passed as the second parameter in the form of a string array. It has one overloaded method signature that accepts integer values as the third and fourth parameters. The third parameter specifies the first array element to use, and the last parameter is the total number of elements to use.

The following C# code sample demonstrates assembling the values used in the previous example:

string sep = ", ";

string[] values = new String[5];

values[0] = "TechRepublic.com";

values[1] = "CNET.com";

values[2] = "News.com";

values[3] = "Builder.com";

values[4] = "GameSpot.com";

string sites = String.Join(sep, values);
Console.Write(sites);

The following output is generated:

TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com

The equivalent VB.NET follows:

Dim sep As String

sep = ", "

Dim values(4) As String

values(0) = "TechRepublic.com"

values(1) = "CNET.com"

values(2) = "News.com"

values(3) = "Builder.com"

values(4) = "GameSpot.com"

Dim sites As String

sites = String.Join(sep, values)
Console.Write(sites)

We could use the overloaded format to specify where to begin and how many elements to include in the result. The following sample begins with the second (note that element numbering begins at zero) and returns a maximum of three elements:

Dim sep As String

sep = ", "

Dim values(4) As String

values(0) = "TechRepublic.com"

values(1) = "CNET.com"

values(2) = "News.com"

values(3) = "Builder.com"

values(4) = "GameSpot.com"

Dim sites As String

sites = String.Join(sep, values, 2, 3)
Console.Write(sites)

The starting element number and the maximum values to return must be valid within the string array being used. If either is invalid (i.e., not contained in the array), then an exception is thrown. For this reason, it is a good idea to utilize a try/catch block to handle any problems.

While the String class provides the necessary methods, it isn't the only way to handle the parsing of a string value. Another common approach takes advantage of regular expressions.

Parsing with regular expressions
The .NET Framework provides the Regex class contained in the System.Text.RegularExpressions namespace for using regular expressions within a .NET application. Parsing is only one of the many applications of regular expressions.

Let's examine the parsing of our sample string using regular expressions. The following ASP.NET page uses C# to parse a comma-delimited list of sites into an array:

<%@ Page Language="C#" Debug="true" %>

<%@ Import Namespace="System.Text.RegularExpressions" %>

<script language="C#" runat="server">

private void Page_Load(object sender, System.EventArgs e){

if (!IsPostBack) {

string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";

string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))";
Regex r = new Regex(pattern);

string[] sites = r.Split(values);
foreach (string s in sites) {
Response.Write(s);
Response.Write("<br>");

} } }

</script>

The equivalent VB.NET code follows. Notice that the inclusion of quotation marks in the string value (pattern) causes problems. So, the quotation marks contained in the string must be escaped to be recognized; this may be achieved by placing two of the characters adjacent to each other.

<%@ Page Language="VB" Debug="true" %>

<%@ Import Namespace="System.Text.RegularExpressions" %>

<script language="VB" runat="server">

Sub Page_Load

If Not (IsPostBack) Then

Dim values As String

values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"

Dim pattern As String

pattern = ",(?=(?:[^\""]*\""[^\""]*\"")*(?![^\""]*\\""))"

Dim r As Regex

r = new Regex(pattern)

Dim sites As String()

sites = r.Split(values)

Dim s As String

For Each s In sites
Response.Write(s)
Response.Write("<br>")

Next s

End If

End Sub

</script>



WORTHWHILE?

0

0 votes
Blog

Talkback 1 comments

Easily parse string values with .NET
Dim values As String

values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"

Dim sites As String() = Nothing

sites = values.Split(",")


does not work
Posted by Ingmar Eidem on Friday, April 24 2009 06:30 AM


Guest user

Guest user

Level: 
Joined: —
Already a member? Log in »



 

Loading...

Whitepapers/Case Studies

Downloads

Web Development News



Tech Jobs Now!

Tags

  1. business applications
  2. c#
  3. developer
  4. html
  5. industry
  6. java
  7. justin james
  8. microsoft .net
  9. microsoft corp.
  10. microsoft visual studio
  11. programming
  12. protocols and platforms
  13. server
  14. soa
  15. software engineering / development
  16. tool
  17. web
  18. web browser
  19. web services
  20. web sites