Tech

Guides
 

Easily parse string values with .NET

By Tony Patton, TechRepublic
Wednesday, January 25, 2006 03:03 PM

The .NET Framework simplifies processing and formatting data with the String class and its Split and Join methods or regular expressions. Learn more about using these methods in your application.

Processing string values is an integral aspect of most application development projects. This often involves parsing strings into separate values. For instance, receiving data from an external data source such as a spreadsheet often utilizes a common format like comma-separated values. The .NET String class simplifies the process of extracting the individual values between the commas.

Extracting values
The Split method of the String class allows you to extract individual values separated by a specific character. The separator value is passed to the method, which is overloaded with its second variation accepting a second parameter that specifies the maximum number of elements to return (extract from the string value). (Note: You can specify more than one separator in a character array.) The values pulled from the string are returned in a String array.

Here are the two variables:

  • String.Split(char[]) in C# or String.Split(Char()) in VB.NET
  • String.Split(char[], int) in C# or String.Split(Char(), Integer) in VB.NET

The following C# snippet populates an array with values contained in a comma-separated string value:

string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";

string[] sites = values.Split(',');
foreach (string s in sites) {
Console.WriteLine(s);

}

The following output is generated:

TechRepublic.com

CNET.com

News.com

Builder.com

GameSpot.com

The equivalent VB.NET code follows:

Dim values As String

values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"

Dim sites As String() = Nothing

sites = values.Split(",")

Dim s As String

For Each s In sites
Console.WriteLine(s)

Next s

You may specify multiple separator characters, which are contained in a character array. The following code splits a string of values separated by a comma, semicolon, or colon. In addition, it uses the optional second parameter to set the maximum number of items returned at four.

char[] sep = new char[3];

sep[0] = ',';

sep[1] = ':';

sep[2] = ';';

string values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com";

string[] sites = values.Split(sep, 4);
foreach (string s in sites) {
Console.WriteLine(s);

}

The following output is generated (notice that the second parameter places the remainder of the string in the last array element):

TechRepublic.com

CNET.com

News.com

Builder.com; GameSpot.com

The equivalent VB.NET code follows:

Dim values As String

values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com"

Dim sites As String() = Nothing

Dim sep(3) As Char

sep(0) = ","

sep(1) = ":"

sep(2) = ";"

sites = values.Split(sep, 4)

Dim s As String

For Each s In sites
Console.WriteLine(s)

Next s

While the Split method allows you to easily work with individual elements contained in a string value, you may need to format values according to a predefined format like comma-separated values. The String class makes it easy to assemble a properly formatted string.

Putting it together
The Join method of the String class accepts the character to be used as the separator as its first parameter. The values to be concatenated are passed as the second parameter in the form of a string array. It has one overloaded method signature that accepts integer values as the third and fourth parameters. The third parameter specifies the first array element to use, and the last parameter is the total number of elements to use.

The following C# code sample demonstrates assembling the values used in the previous example:

string sep = ", ";

string[] values = new String[5];

values[0] = "TechRepublic.com";

values[1] = "CNET.com";

values[2] = "News.com";

values[3] = "Builder.com";

values[4] = "GameSpot.com";

string sites = String.Join(sep, values);
Console.Write(sites);

The following output is generated:

TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com

The equivalent VB.NET follows:

Dim sep As String

sep = ", "

Dim values(4) As String

values(0) = "TechRepublic.com"

values(1) = "CNET.com"

values(2) = "News.com"

values(3) = "Builder.com"

values(4) = "GameSpot.com"

Dim sites As String

sites = String.Join(sep, values)
Console.Write(sites)

We could use the overloaded format to specify where to begin and how many elements to include in the result. The following sample begins with the second (note that element numbering begins at zero) and returns a maximum of three elements:

Dim sep As String

sep = ", "

Dim values(4) As String

values(0) = "TechRepublic.com"

values(1) = "CNET.com"

values(2) = "News.com"

values(3) = "Builder.com"

values(4) = "GameSpot.com"

Dim sites As String

sites = String.Join(sep, values, 2, 3)
Console.Write(sites)

The starting element number and the maximum values to return must be valid within the string array being used. If either is invalid (i.e., not contained in the array), then an exception is thrown. For this reason, it is a good idea to utilize a try/catch block to handle any problems.

While the String class provides the necessary methods, it isn't the only way to handle the parsing of a string value. Another common approach takes advantage of regular expressions.

Parsing with regular expressions
The .NET Framework provides the Regex class contained in the System.Text.RegularExpressions namespace for using regular expressions within a .NET application. Parsing is only one of the many applications of regular expressions.

Let's examine the parsing of our sample string using regular expressions. The following ASP.NET page uses C# to parse a comma-delimited list of sites into an array:

<%@ Page Language="C#" Debug="true" %>

<%@ Import Namespace="System.Text.RegularExpressions" %>

<script language="C#" runat="server">

private void Page_Load(object sender, System.EventArgs e){

if (!IsPostBack) {

string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";

string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))";
Regex r = new Regex(pattern);

string[] sites = r.Split(values);
foreach (string s in sites) {
Response.Write(s);
Response.Write("<br>");

} } }

</script>

The equivalent VB.NET code follows. Notice that the inclusion of quotation marks in the string value (pattern) causes problems. So, the quotation marks contained in the string must be escaped to be recognized; this may be achieved by placing two of the characters adjacent to each other.

<%@ Page Language="VB" Debug="true" %>

<%@ Import Namespace="System.Text.RegularExpressions" %>

<script language="VB" runat="server">

Sub Page_Load

If Not (IsPostBack) Then

Dim values As String

values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"

Dim pattern As String

pattern = ",(?=(?:[^\""]*\""[^\""]*\"")*(?![^\""]*\\""))"

Dim r As Regex

r = new Regex(pattern)

Dim sites As String()

sites = r.Split(values)

Dim s As String

For Each s In sites
Response.Write(s)
Response.Write("<br>")

Next s

End If

End Sub

</script>



WORTHWHILE?

0

0 votes
Blog

Talkback 0 comments

There are currently no comments for this post.

Guest user

Guest user

Level: 
Joined: —
Already a member? Log in »



 

Loading...

Up close and personal with a merger

Blog thumbnail

What can you get for 13.9 billion buckaroos? For Hewlett-Packard, US$13.9 billion would allow you to buy your way into becoming the second biggest IT services company in the industry...... by Eileen Yu

Read more »

Whitepapers / Case Studies

Downloads

Web Development News

 
Powerful technology that drives your business needs
Increase datacenter efficiency with innovative technology that uses less energy and lowers power costs for your business demands.
» Maximum flexibility with powerful blade technolgy
» Bring new services and applications online faster
» Lower energy use and cost
Oracle SOA Business Software Centre
Many companies are recognizing the need to adopt standards in their efforts to build service-oriented applications.
Secure the "Next-Gen SOA Infrastructure" & "Bringing SOA Value Patterns to Life" whitepapers here


» Visit the Power Center

Tech Jobs Now!

Tags

  1. access
  2. build
  3. command-line
  4. configure
  5. css
  6. develop
  7. device
  8. don’t
  9. java
  10. management
  11. manager
  12. mysql
  13. network
  14. performance
  15. program
  16. project
  17. securely
  18. security
  19. server
  20. service
  21. site
  22. snort
  23. sql
  24. storage
  25. use
  26. via
  27. web
  28. windows
  29. word
  30. xml