Home Page

Members

Software Products

Public Articles

Web Link Directories

Technology Glossaries

Calendar of Events

Professional Blog

Personal Blog

Biographies

Contact Information

Request Invitation

News

July 7, 2009 - Wanted: Volunteer Contributors

How to Parse URL Files

by Christopher Morley

Do you bookmark?

Do you use your browser's bookmarking?

I happen not to. Instead, I have taken a liking for the feature that all major browsers have now, which is to drag the icon preceding the address in the address bar to the Desktop or to a folder. That way, I can stick my bookmarks whereever I like and dig them up later if I need to. Often I do this when I find an article that I have half-read, or something that I would like to revisit later. Often, I do not have time to visit them later, in fact. So these files are all over the place.

Here is a typical example of what kind of mess I can have to clean up:

See? Lots of little URL files everywhere. Not very usable.

What can we do with these files other than push them around our folders like we do with other files?

Does working with them have to be difficult? Can we read them programatically, perhaps gather them and sort them out, wrap them up?

Sometimes I want to parse these types of files into some other format, whether it is inserting them into a database, creating a bookmark list in a Word Document, creating a text file, or creating an Excel spreadsheet.

The primary things we care about are the URL Address itself and the Title of the page. At some point, we might also like to know the datetime the file was created on our machine.

Well, having wondered this, I found some solutions that can be implemented using .NET. (I write in C#, all you VB.NET fans, feel free to have a computer service translate for you.)

The first thing to know is that a common URL file is nothing really more than an INI file, which is a text file in a simple format. It is basically just a list of keys and values separated onto different lines in a text file. The keys and values are separated by the equal sign, and sections are commonly wrapped in square brackets.

That information alone is enough to get started on. Let's see what we can do.

Here is a simple example:

[InternetShortcut]
URL=http://www.visionsfineart.com/ocampo/buddha.html

That's it? That's all there is? Okay..Well this means that the file name itself is the title of the page. We need to keep that. Also, the datetime of the file is what we want for when this link was added to our file system.

Here is a good article:

http://jachman.wordpress.com/2006/09/11/how-to-access-ini-files-in-c-net/

We can do it pretty much like that, and again, the name and the last write time serve as our other values.

Suppose we wanted to supply a folder path, and export a list of websites represented by the URLs in that directory... so this code will do what we want....make sure to replace the highlighted part with your directory of choice, and then do something more useful than print to the console.


using System;

using System.Collections.Generic;

using System.IO;

using System.Runtime.InteropServices;

 

namespace ConsoleApplication2

{

    class Program

    {

        static void Main(string[] args)

        {

            DirectoryInfo di = new DirectoryInfo(@"C:\Users\Chris\Desktop\some stuff to review");

            FileInfo[] files = di.GetFiles("*.url");

            Console.WriteLine("1");

            foreach (FileInfo fi in files)

            {

                List<string> categories = GetCategories(fi.FullName);

                foreach (string category in categories)

                {

                    //Console.WriteLine(category);

                    List<string> keys = GetKeys(fi.FullName, category);

                    foreach (string key in keys)

                    {

                        if (key == "URL") {

                            string content = GetIniFileString(fi.FullName, category, key, null);

                            //Console.WriteLine(string.Concat(" ", key, "\t", content));

                            Console.WriteLine(

                                "Title:" + fi.Name + "\r\n" +

                                "Datetime:" + fi.LastWriteTime + "\r\n" +

                                "URL:" + content + "\r\n");

                        }

                    }

                }

            }

            Console.ReadKey();

        }

 

        #region Useful for reading INI Files

 

 

        private static List<string> GetCategories(string iniFile)

        {

            string returnString = new string(' ', 65536);

            GetPrivateProfileString(null, null, null, returnString, 65536, iniFile);

            char[] sep = { '\0' };

            List<string> result = new List<string>(returnString.Split(sep));

            result.RemoveRange(result.Count - 2, 2);

            return result;

        }

        private static List<string> GetKeys(string iniFile, string category)

        {

            string returnString = new string(' ', 32768);

            GetPrivateProfileString(category, null, null, returnString, 32768, iniFile);

            char[] sep = { '\0' };

            List<string> result = new List<string>(returnString.Split(sep));

            result.RemoveRange(result.Count - 2, 2);

            return result;

        }

 

        private static string GetIniFileString(string iniFile, string category, string key, string defaultValue)

        {

            string returnString = new string(' ', 1024);

            GetPrivateProfileString(category, key, defaultValue, returnString, 1024, iniFile);

            char[] sep = { '\0' };

            return returnString.Split(sep)[0];

        }

 

 

        [DllImport("KERNEL32.DLL", EntryPoint = "GetPrivateProfileStringW",

SetLastError = true,

CharSet = CharSet.Unicode, ExactSpelling = true,

CallingConvention = CallingConvention.StdCall)]

        private static extern int GetPrivateProfileString(

          string lpAppName,

          string lpKeyName,

          string lpDefault,

          string lpReturnString,

          int nSize,

          string lpFilename);

 

        [DllImport("KERNEL32.DLL", EntryPoint = "WritePrivateProfileStringW",

          SetLastError = true,

          CharSet = CharSet.Unicode, ExactSpelling = true,

          CallingConvention = CallingConvention.StdCall)]

        private static extern int WritePrivateProfileString(

          string lpAppName,

          string lpKeyName,

          string lpString,

          string lpFilename);

 

        #endregion

 

    }

}


 

 

Here is a picture of some of the output:

 

With some help from this article, we could make this recursive.

http://support.microsoft.com/kb/303974

Also, we'll want to write to a file, and perhaps make the output somewhat friendly for HTML.

 

Just be sure to escape the backslashes as you pass args to the exe at the command line.

 

using System;

using System.Collections.Generic;

using System.IO;

using System.Runtime.InteropServices;

 

namespace ConsoleApplication2

{

    class Program

    {

 

        static void Main(string[] args)

        {

            if (args.Length != 2)

            {

                Console.WriteLine("usage: exe SearchPath LoggingPath");

            }

            DirectoryInfo main = new DirectoryInfo(args[0]);

            DirSearch(main, args[1]);

        }

 

        private static void printUrl(FileInfo fi, String path) {

            List<string> categories = GetCategories(fi.FullName);

            foreach (string category in categories)

            {

                //Console.WriteLine(category);

                List<string> keys = GetKeys(fi.FullName, category);

                foreach (string key in keys)

                {

                    if (key == "URL")

                    {

                        string content = GetIniFileString(fi.FullName, category, key, null);

                        //Console.WriteLine(string.Concat(" ", key, "\t", content));

                        //Console.WriteLine(

                        //    "Title:" + fi.Name + "\r\n" +

                        //    "Datetime:" + fi.LastWriteTime + "\r\n" +

                        //    "URL:" + content + "\r\n");

                        //Console.WriteLine(fi.LastWriteTime + " - <a href=\""+content + "\">"+fi.Name.Replace(".url","")+"</a><br />" + "\r\n");

                        log(fi.CreationTime + " - <a href=\"" + content + "\">" + fi.Name.Replace(".url", "") + "</a><br />" + "\r\n", path);

                    }

                }

            }

        }

 

        private static void log(String message, String path)

        {

 

            try

            {

                string fullFilename = String.Format(@"{0}\log.html", path);

                if (!File.Exists(fullFilename))

                {

                    FileStream fs = File.Create(fullFilename, 1024, FileOptions.WriteThrough);

                    fs.Close();

                }

                StreamWriter sw = File.AppendText(fullFilename);

                sw.WriteLine(String.Format("{0}", message));

                sw.Close();

            }

            catch (Exception ex)

            {

                Random r = new Random();

                string errorFile = String.Format(@"{0}\error{1}.html", path, r.Next(10000000));

                FileStream fs = File.Create(errorFile, 1024, FileOptions.WriteThrough);

                fs.Close();

                StreamWriter sw = File.AppendText(errorFile);

                sw.WriteLine(ex.ToString());

                sw.Close();

            }

        }

 

        private static void DirSearch(DirectoryInfo sDir, String path)

        {

            try

            {

                foreach (DirectoryInfo d in sDir.GetDirectories())

                {

                    try

                    {

                        //Console.WriteLine(d.FullName);

                        foreach (FileInfo f in d.GetFiles("*.url"))

                        {

                            try

                            {

                                printUrl(f, path);//!

                            }

                            catch

                            {

                                //then skip

                            }

                        }

                        DirSearch(d, path);

                    }

                    catch

                    {

                        //then skip

                    }

                }

            }

            catch

            {

                //then skip

            }

        }

 

        #region Useful for reading INI Files

 

 

        private static List<string> GetCategories(string iniFile)

        {

            string returnString = new string(' ', 65536);

            GetPrivateProfileString(null, null, null, returnString, 65536, iniFile);

            char[] sep = { '\0' };

            List<string> result = new List<string>(returnString.Split(sep));

            result.RemoveRange(result.Count - 2, 2);

            return result;

        }

        private static List<string> GetKeys(string iniFile, string category)

        {

            string returnString = new string(' ', 32768);

            GetPrivateProfileString(category, null, null, returnString, 32768, iniFile);

            char[] sep = { '\0' };

            List<string> result = new List<string>(returnString.Split(sep));

            result.RemoveRange(result.Count - 2, 2);

            return result;

        }

 

        private static string GetIniFileString(string iniFile, string category, string key, string defaultValue)

        {

            string returnString = new string(' ', 1024);

            GetPrivateProfileString(category, key, defaultValue, returnString, 1024, iniFile);

            char[] sep = { '\0' };

            return returnString.Split(sep)[0];

        }

 

 

        [DllImport("KERNEL32.DLL", EntryPoint = "GetPrivateProfileStringW",

SetLastError = true,

CharSet = CharSet.Unicode, ExactSpelling = true,

CallingConvention = CallingConvention.StdCall)]

        private static extern int GetPrivateProfileString(

          string lpAppName,

          string lpKeyName,

          string lpDefault,

          string lpReturnString,

          int nSize,

          string lpFilename);

 

        [DllImport("KERNEL32.DLL", EntryPoint = "WritePrivateProfileStringW",

          SetLastError = true,

          CharSet = CharSet.Unicode, ExactSpelling = true,

          CallingConvention = CallingConvention.StdCall)]

        private static extern int WritePrivateProfileString(

          string lpAppName,

          string lpKeyName,

          string lpString,

          string lpFilename);

 

        #endregion

 

    }

}