Golang & Web programming: “The World Aggregator” (1)

Last time, I introduced “The World Aggregator” project, a little web app I created to discover and better understand the Go language, and that I’m now presenting here in a “tutorial” style. In this second article, I will talk about Golang, its characteristics, and how we can use it to develop the server for our project.

In “The World Aggregator”, our goal is going to be to hit various URLs that have interesting data per country (population, CO2 emissions, GDP…), fetch data from them, preprocess this data and finally send it to a front-end to display it in a nice way (with maps or tables).

For now, let’s focus on the server and leave the pretty displaying part for the next article. Here, I will rather explain some fundamental ideas of Go and create a first working version of our server.

Disclaimer: this is a short tutorial of Golang – I may gloss over some details and I won’t introduce concepts that are not necessary for this particular project (even if there are some pretty neat ones worth checking out, like pointersinterfaces or variadic functions for example). However, this article is a bit longer than most because I still try to gradually build a full code example to play with.

Go: quésaco?

In this first part, I will go over core concepts of Go – without applying them to our actual project yet – to give you a first feel of how the language works.

If you want a more in-depth explanation of these concepts, you can also take a look at the official tutorial, “A Tour of Go”.

Basics of Go

To begin with, let’s take a look at the core features and specificities of Golang compared to other languages (in particular C, JavaScript and Python).

Go is a compiled language, like C, as opposed to JavaScript or Python. This means that whenever you write Go code, you need the Go compiler to convert it to machine code and run it. To install Go, you can visit the website and follow the install instructions for your operating system.

Once you’re done, to check that everything is all right, just copy this piece of code in a file called test.go:

[snippet slug=072019_goweb-helloworld lang=golang]

And then run it using the shell command: go run test.go. If everything goes according to plan, you should see the text “Hello World!” appear in your terminal.

Alright, let’s now examine how this Go code is structured:

  1. at the top, you have the package main line: a key idea in Go is that files belong to “packages”, in other words groups of files that are logically linked and make a coherent whole; packages can be imported into a file to use the functions inside of it
  2. then, you have imports: here, we import the fmt package from Go’s standard library (built-in with your installation) that provides I/O functions, like displaying text on the screen
  3. finally, we define our main() function: this will be run by Go at the very start, it is the entry point for our entire program

We’ve just written our first Go file: well done!

We see that Go is a curly-brace language (same as C or JavaScript), meaning that the statement blocks are surrounded by curly braces. This is different from Python that uses indentation to define the code blocks. Also, we always use double quotes to delimit strings – we cannot use single quotes like in JavaScript and Python.

Basic I/O functions

Golang has some basic input and output functions in its standard library package fmt (as shown above). Those functions are quite similar to the ones we have in C, like printf, sprintf, fprintf

I won’t do a comprehensive list of all the available functions, but here are some that you might use quite often:

  • fmt.Println(): prints a string to the standard output with no specific formatting; it’s a bit like Python’s print() function because you can pass in strings and variables separated by commas to print them one after the other with spaces in between
  • fmt.Printf(): prints a formatted string to the standard output (with formatters like in C or Python: %d, %f, %s…)
  • fmt.Fprintf(): same as the previous one but prints the result into a file

As explained in the package’s doc, the 3 Printf(), Fprintf() and Sprintf() functions (that respectively print to the standard output, a file or a string variable) have their “ln” equivalents that do a similar thing but with no formatting and just spaces between the inputs (Println(), Fprintln(), Sprintln()).

To input values, you can use Scan() or Scanf() for example.

Static typing

If you are used to Python or JavaScript, this might be hard at first: Go is a statically typed language. This means that variables are given one type upon initialization and it cannot change afterwards. Functions prototypes, as we will see very soon, must incorporate the type of their arguments and their return values; you cannot perform mathematical operations between an int and a float directly; the compiler will crash if you try to use an incorrect type in place of another.

Also, compared to the other languages in the C family (which Go borrows lots of its syntax from), you first put the name and only then the type of your variable. We’ll come back to this “reverse order” when talking about functions, but if you’re too weirded out by this, you can read the article Go wrote to explain (and justify) their declaration syntax choice.

So, we could initialize some variables and play around with this typing system like this:

[snippet slug=072019_goweb-types lang=golang]

Compared to dynamically typed languages, it might seem like a lot of burdens at first, but it actually has its benefits:

  • it is way faster (because the compiler can perform some optimizations at compilation time and we already know what we are working with when we run the code)
  • it is less prone to type conversion errors (the famous Python 2 to Python 3 division default, for example, had me pull out my hair a few times to be honest!)
  • in Go’s case (and other languages such as JavaScript), you have a sort of type safety that prevents you from having type errors; on the other hand, C# is an example of type-unsafe language because you can substitute a class for its child class without the compiler noticing, and C is also type-unsafe because you have direct access pointers

Note: it’s important to note that “type-safe” is not necessarily paired with “strongly typed” (see JavaScript: it is very weakly typed but is type-safe). Having a type system does not automatically insure that you handle all the possible errors that might arise from mixing up types, looking outside of an array and so on.

However, Go has kind of a “duck-typing” system. Indeed, when you initialize variables, you can either specify their type explicitly like in the previous bit of code, or you can let them infer their type from the default value you’re providing:

[snippet slug=072019_goweb-types2 lang=golang]

Be careful, though, because sometimes the inferred type might not be the one you expect!

[snippet slug=072019_goweb-types3 lang=golang]

If need be, you can do some type-casting with built-in functions like int(), float64()

More complex data types

Go offers quite a large palette of basic variable types: int, int32, int64, float32, float64, uint8, string, bool

But to have variables with more complex data types, like a variable that would have several fields tightly linked together, we use “structs”, just like in C. For example, you could create a Point struct with a x and a y coordinate and build a variable of this type it in the following way:

[snippet slug=072019_goweb-struct lang=golang]

The nice thing is that with structs, you can store data easily and even define specific functions that are linked to this struct, either for printing or for more interesting computation, which gives us some equivalent to the class system in C++ (we’ll talk more about functions in a little while):

[snippet slug=072019_goweb-struct2 lang=golang]

Note that, like with the C typedef keyword, you can also “rename” a type to call it more easily, with something like type PStruct *Struct for example.

Data structures

It is also interesting to take a look at the data structures that are available in Go to see how we can store all these variables.

First, you have arrays. Arrays contains a fixed number of values (all of the same type) stored next to each other in memory. To create an array, you simply put brackets with the size of the array and then the type of variables in the array; you can access the elements by index like in C, Python or JavaScript (Go is 0-indexed):

[snippet slug=072019_goweb-datastruct lang=golang]

Once you have declared it, you cannot resize an array: to work around this limitation, we can use slices.

Slices are dynamically sized “views” of elements in an array. We can create them either from scratch, using the make keyword or based on a previously array with a Python-like indexing syntax – notice that we don’t put a number between the brackets because we are not constrained to a fixed size anymore:

[snippet slug=072019_goweb-datastruct2 lang=golang]

It is possible to append elements to a slice, to look at a part of a slice with the same sort of indexing, to get the length or the capacity of a slice with the len() and cap() functions…

Since you can also create slices of slices, you can easily create a multidimensional slice:

[snippet slug=072019_goweb-datastruct3 lang=golang]

The last type of data structure that is commonly used is the map. It’s a bit like dictionaries in Python because it is a key-value pairing structure. This means that instead of accessing elements by index, you access them by key. A map is declared by specifying its key type and its value type (and if you want to declare it without initializing its elements immediately, you can use the make keyword once again).

After the map has been created, you can mutate it by modifying elements, deleting elements, checking if there is a specific key…

[snippet slug=072019_goweb-datastruct4 lang=golang]

Doing loops

In Golang, there is only one type of loops, the for loop. In its default form, it looks a lot like C’s for loop. It is built from 3 components:

  • the init statement: executed before the iteration
  • the condition statement: evaluated before every iteration
  • the post statement: executed at the end of every iteration

So a basic for loop in Golang has the following form:

[snippet slug=072019_goweb-loops lang=golang]

The init and post statements are optional, you can simply have a semi-colon, the condition statement and another semi-colon. In fact, you don’t have a while keyword in Go because this type of loop can be constructed just by leaving out the init and post statements. Also, it is very easy to build an infinite loop in Go: just forget the 3 statement and have a for directly followed by braces.

A small difference with C and JavaScript though: the curly braces are necessary and you don’t put braces around the statements.

On the other hand, you have some constructs that resemble more Python, too, thanks to the range keyword. It’s a way of iterating over a data structure and can be used with a for loop pretty easily:

[snippet slug=072019_goweb-loops2 lang=golang]

Note: I will talk about using the underscore character to “forget” a variable in just a bit.

Writing functions

Overall, writing functions in Golang is quite straight-forward when you’ve done some programming before, especially if you’ve written some C or C++ and are used to typed languages. In Go, you have a func keyword, a function name, some input parameters and some return values.

What might surprise you is that:

  • you can return multiple values (by simply passing multiple return types in your function’s prototype and multiple values to your return statement)
  • you can aggregate parameters of the same type together (but you cannot aggregate return types, though: if you return two integers, you must write out the int type twice)
  • the function’s prototype is written in a strange order

What do I mean by that, “strange order”? Well it’s a bit like the variables where you first give the name and then the type; for example, here is a basic prototype for a function that takes in two integers, one float, and returns two floats:

[snippet slug=072019_goweb-func lang=golang]

To me, at first, it felt like Golang had everything reversed! Having done some C before, I was used to having the return type at the very beginning of the line, then my input parameters with first the type and then the name. Here, it works the other way around: your prototype contains first the func keyword, then the function name, then the input parameters with the name followed by the type, and finally the return type(s). Also, notice how you can group together two parameters of the same type (like a and b, in the above example).

In Go you don’t declare a prototype on its own and you always follow this line with the actual body of your function, inside of curly braces:

[snippet slug=072019_goweb-func2 lang=golang]

If you’re a fan of Fortran, you can name your returned values so that these variables are automatically sent back.

As we’ve seen before, it is possible to apply functions to structs directly, too, so as to have a sort of class system. Here is a small article on the topic if you want to learn more.

Error handling

Before diving into our “World Aggregator” project, let’s cover a last key thing of Go: the way errors are handled.

The language provides a built-in error type that is used by lots of functions from the standard library (like methods for file opening, string parsing…). Since a function can return multiple values in Go, it is very easy to return the actual value you’re interested in and an error flag that the user can check to know if the function ran seamlessly. This is why you will often see bits of code like that in Go:

[snippet slug=072019_goweb-error lang=golang]

This means that Go has no exception handling system by default and that you need to take care of the errors yourself.

But Golang has a built-in panicking system that makes sure that whenever a function crashes, the program exits. You can use the panic() function and pass it a message to create a panic yourself and stop your code if it reaches some critical lines.

An important thing however is that when in a panic, the program will still get out of the various stacks one by one and execute their defer statements if there are any. The defer keyword is thus a way to force a line of code to be executed upon exit of a function whether the program ended correctly or not. For more details on the defer, panic and recover functions, take a look this blog post from Go.

Finally, a quick trick to use functions that return an error without handling it: as you know, Golang will panic with an “unused variable” error if you get back the error flag but don’t check it; on the other hand, it will also panic and crash if you try to get back just the actual value while the function is supposed to return two things. So, is there a solution? Yes! The way to import, or declare something in Go without using it afterwards, is to store it into a _ variable (the underscore character).

Applying all this to “The World Aggregator” project

Now that we have seen the basics of Go, let’s see how we can apply this to our project. This will also be an opportunity to discuss more advanced concepts of Golang: package management, concurrency and HTTP serving.

Fetching some data and parsing it

Go’s packages (and getting the htmlquery package)

As you know, some languages or frameworks are famous for their ecosystem – NodeJS and Python are well-known for offering thousands of modules you can install to fit almost all your needs!

For the past years, people have complained because installing packages in Go  was not always easy. Compared to the ease of use of NodeJS’ package manager (npm), you had to find the package and install it in the right place – you needed to hook everything up yourself. Well, Go does not yet offer a true package manager (even though some ideas are being discussed in Github threads) but the go command now integrates a go get option to directly download and put in the right place a package directly from a Github repository. So, to create a Go package and publish it, all you need to do is put it in a Github repository!

To get a package this way, simply run a command like this:

go get github.com/githubauthor/mysuperpackage

This will grab the content of the mysuperpackage Github repository and put it in your $GOPATH/src folder. This is located in different places depending on your OS; on Mac for example, this should be your $HOME/go/src folder by default.

Note: here, I focus on non-executable packages that are just a folder of files with interesting functions you can import. Executable packages, on the other hand, are compiled beforehand and can then be executed as binaries – they must contain the main() function entry point we mentioned earlier. They are whole programs on their own, rather than the non-executable ones that simply enhance a program by adding more functionalities to it.

I won’t talk about package installation and initialization, either, but this article is a nice reference about packages in Go.

As we saw before, a package is simply a group of files that define an ensemble of functions logically related. The first line in your file indicates which package “scope” you are in which determines which functions and variables you can access. Multiple files might share variables if they have the same scope, but whatever is outside of this scope will need to import the package to have access to some of its content. Remember though that when you have imported a package, you can only access the functions and variables that are exposed in it: they are easy to spot because they start with a capital letter. For example, the function MyFunction() will be exported from my package, but not myFunction().

Having package scopes also helps with unique declarations because you cannot define the same function or variable twice inside of the same package; but two packages can define a function or variable with the same name since their scopes are different! This is also why you cannot directly import functions from packages like you would in Python (with the from my_package import my_func statement): to prevent scope confusion, you always have to give the name of the package you call the function or variable from when you use it. On the other hand, if you try to import two packages that have the same scope name, the program will crash because there will be a conflict!

To avoid this issue, or if your package name is quite long and you don’t want to type it each time you call a function or a variable from it, you can precede your import with a shorter alias for it:

[snippet slug=072019_goweb-pkgalias lang=golang]

Quick note: you can use the trick of the underscore once again to import a package even though you don’t want to use it quite yet (e.g.: during your development phase)!

Since a package is truly just a folder that can contain subpackages (basically, subfolders), to access a package nested inside of a package, you simply use relative paths. For example, Go’s standard library contains the math package, and this package contains a rand subpackage that can be imported with import "math/random".

Something really important is that to use the functions inside rand, you actually need to import math/rand and not just math – you cannot access a subpackage afterwards like you would in Python with the . character.

Alright, this being said, the package we want to use here is called htmlquery. To import it, we’ll just do a go get github.com/antchfx/htmlquery. We will see how to import and use it in the next part.

Fetching and parsing some HTML content

To get a first taste of how to download and extract information from a web page, let’s take a look at Go’s website homepage:

We are going to fetch the HTML content from this page and parse it to get the content of the small program example on the right (the code box) using XPath. XPath, or Xml Path Language, is a query language to parse XML or HTML content. If we examine the elements in the page, we see that the for the part that we are interested in, HTML is structured with a section, some divs and a textarea (here, I have also denoted in orange the name of the class of the section):

Now that we know how this part of the page is structured, we can construct an XPath query string to isolate the text we want. I won’t do a full XPath tutorial, here (W3Schools has a straight-forward enough tutorial if you want) and just give a high overview of how we will build our string:

  1. access the first element in the page of type section which has a class “Playground”
  2. access its second div child
  3. finally, get the textarea in it

This results in this XPath:

//section[contains(@class, \"Playground\")]/div[2]/textarea

The first double slash is a way of telling the parser not to start at the top of the file but search for the first element in the page that matches the request and we see that XPath starts indexing at 1 and not 0; apart from that, the rest is a translation of our previous structure analysis.

Thanks to the htmlquery package, it is very easy to load HTML content from a URL. To do so, we just this need this short code:

[snippet slug=072019_goweb-xpath lang=golang]

We see that to import the package, we just need to tell the compiler it comes from the github.com subfolder. The compiler will try to load it first from Go’s standard library package folder, then it will try $HOME/src and find it there. This code prints out some geeky thing in the console, similar to this:

&{<nil> 0xc000494540 0xc0004945b0 <nil> <nil> 2    []}

Basically, this tells us that it went well and that our doc variable now contains the content we want (otherwise, the program would have panicked and crashed). But of course, this is not very readable; and, to be honest, printing the entire HTML content from this page would be a bit long.

But applying our XPath query is quite simple:

[snippet slug=072019_goweb-xpath2 lang=golang]

The parser will put the results in an array, even if we only get one element here – this is why we inspect the inner text of results[0] (and not results directly). Also, we need to make sure that we escape the double quotes inside the XPath. Hopefully, you should see this printed out in the console:

// You can edit this code!
// Click here and start typing.
package main
import "fmt"

func main() {
  fmt.Println("Hello, 世界")

We just extracted data from a web page, yay!

Storing the data

We are able to get data but we still need to think about how we will store it so that we can pass it on to our front-end afterwards. We are going to take advantage of Go’s structs, slices and maps.

I’ve spotted a few URLs that hold per-country data about various topics: population, renewable energy production, Internet access… Feel free to use your own if you want (or if this tutorial gets out of date and these URLs disappear for some reason!). Each of these pages have some list of country names with an associated value:

We are going to group our results into a CountryData struct. We are also going to define a CountryDataset struct that contains a slice of CountryData variables, a float MinValue and a float MaxValue. Here are the structs definitions:

[snippet slug=072019_goweb-struct_def lang=golang]

This should be straight-forward if you have grasped the concepts presented so far. Now, if we choose the first URL I’m proposing here, we can extract the population of each country with the following code (trust me on the XPath if you want, or build it yourself by analyzing the page’s structure); it’s a bit long but shouldn’t be too hard to understand with the previous examples:

[snippet slug=072019_goweb-project lang=golang]

This is starting to look good: we can still refine lots of things but we are already fetching content from a given URL, extracting data with an XPath, gathering everything with our structs and even doing some processing to get the min and max values in the dataset.

Making the code more flexible

A big issue with our current code is that it is tuned specifically for our reference URL. In particular, the way we extract information from the page with an XPath is highly dependent on the way the HTML content is structured, so it is dependent on the URL itself.

To solve this problem, we are going to define yet another struct, this time to represent the “context” of our query: which URL we want to hit and which function we should use to extract results. We will also make a type alias ExtractionFunction for the sort of functions we use for parsing, so that our struct is easier to write out. Then, we’ll define a map of those structs to store multiple search contexts:

[snippet slug=072019_goweb-struct_def2 lang=golang]

There are quite a number of things going on here, so let’s recap:

  • first, we import the golang.org/x/net/html package to be able to access the html.Node type in our extraction function prototype and definition (these nodes represent HTML elements in a web page)
  • then, we define our ExtractionFunction alias type: an extraction function takes some HTML content (a pointer to an html.Node) and returns 2 sets of html.Node pointers, one for the country names and one for the associated values at that URL
  • the SearchContext struct contains a field for the URL to fetch and an extraction function
  • finally, our (currently short) map of SearchContext variables is the list of URLs we will want to scrap and the function that should be called on the HTML content once it has been fetched

We can now reintegrate this in our previous file and add a for loop to browse all our URLs:

[snippet slug=072019_goweb-project2 lang=golang]

Merging with a simple HTTP server and doing some HTML templating

The web server (in Go)

A nice thing with Go is that the standard library offers some utility to set up an HTTP server super easily!

To create a web server, we simply need to:

  • import the net/http standard package
  • create a server and link its root address to a function that returns some basic content
  • launch the server in the main() function

This is done with just those few lines of code:

[snippet slug=072019_goweb-server lang=golang]

If you save this as server.go for example, run this code in a terminal (with go run server.go) and go to http://localhost:8000/, you should see a page with “Hello server!” displayed at the top left.

Of course, this very basic server is not really interesting: all urls on localhost:8000 will fall back on this page and we don’t display much. We can improve this a bit by defining another route and passing in more complex HTML:

[snippet slug=072019_goweb-server2 lang=golang]

But this is no way to actually create a complete HTML page! To do this, we will use Golang’s HTML templating feature.

The front-end display (in HTML and Go templating language)

If you have ever done Flask applications in Python, or some PHP, you are probably familiar with the concept of HTML templating. The idea is to define an overall structure for your web page with some specific “ready-to-fill” elements that will feed off the server’s response to be populated upon loading.

Suppose you have a very basic page that just shows a “Hello world!” message with a greeting specific to the user (based on the parameter you entered in your url).

On the server side, we have two possibilities:

  1. we do something very basic and reinject the variable directly when sending back the response to the client
  2. we do something that will be more scalable by preparing an HTML template to fill on the client side and sending the value to fill as the server’s response

The first option is quite easy to derive from the basic Go server example; I won’t detail this. Instead, let’s focus on the second option. On the server side, we will have a simple HTTP server that also extracts a parameter from the URL and uses a template rather than returning HTML directly:

[snippet slug=072019_goweb-template lang=golang]

On the client side, we only need to prepare a basic HTML file:

[snippet slug=072019_goweb-template2 lang=html]

As you can see, in Go’s templating syntax, every element that should be filled with a Go variable is inside two curly braces. The dot is a way of telling the template to access the entire data it is fed (here, we only send one string, but when we will start to send more complex objects, we will be able to access their inner fields starting from the dot).

If I run the Go server and go the URL: http://localhost:8000/?name=Bob, I get a page with a “Hello world!” title and a label saying “Greetings Bob!”.

The nice thing is that we can do more complex things like browse a struct field par field, iterate over a slice or a map… Don’t hesitate to take a look at the package’s doc to learn more about this!

Let’s apply this to our project. We know the structure of the data that comes from our server:

  • the server gives a list of datasets each mapped to a string that is its name (the value linked to each country at this URL)
  • each dataset contains a list of country name-value pairs, a minimum value and a maximum value

We will simply print it as a table for now, using Go’s templating range tool:

[snippet slug=072019_goweb-project3 lang=html]

This will iterate over all the datasets we get from the server and create a title, some text for the dataset overall info and a table; then, inside of each iteration, we also iterate over the data in the dataset to create new table rows.

Finally, we can reintegrate all the HTTP server in our Go code: basically, we will mix the HTTP serving and template injection with the data fetching and processing. We will also reorganize things a bit by putting some of our logic into smaller functions – this will make it easier to improve afterwards:

[snippet slug=072019_goweb-project4 lang=golang]

Now, let’s refresh the page; after a few seconds (depending on your network connection…), we get two (long) tables showing the values per country for each dataset. Here we are: we have a first working version of The World Aggregator!

A last little improvement: adding some concurrency!

Before we end this article, let’s write a last nice improvement for our server making use of Golang’s ability to handle concurrency. Now, you should be aware that concurrency is not the same as parallelism:

  • in a parallel execution, multiple threads are asked to carry out multiple tasks at the same time (so you have multiple “agents” to do the work)
  • in a concurrent execution, you have only one “agent” but the work is cut into small bits so that the worker can switch from one to another very quickly and realize all the tasks in an overlapping time frame

If you want a more detailed (and probably better!) explanation on the difference between parallelism and concurrency, I encourage you to watch Rob Pike’s talk on “Concurrency Is Not Parallelism” (he’s one of the 3 creators of Go, remember?).

Alright, that being said, let’s see how adding concurrency to our current program can improve its performance. First, let’s examine how long the data takes to be fetched and sent to our page. To do so, we can go to our HTML page and open up the console of our browser (it depends on the browser but you often have a menu where you have “Developer options” and you can switch it on – in Safari, you need to go to your System Preferences and enable this menu). Here, there should be a “Network” tab that shows you all the files traffic that happen on your web page. When we load the page, we see that actually most of the time is spent waiting for the data from our server to be prepared (i.e. for the information to be scraped and processed) and the download is then really quick.

This is because, with this current version, our server is processing each url sequentially; this is a bad way of doing things, for 2 reasons:

  • each request has to wait for the previous one to be entirely done – if something blocks at one point, the rest down the chain will be blocked too
  • our requests are independent: we do not need information from any other url to grab the data from a page

All of this points us in the direction of concurrent programming. Rather than processing each url one after the other, we should process them all at once and then gather all the results. This figure is a schematic representation of the difference between sequential and concurrent data fetching:

Schematic representation of the data fetching from multiple URLs in sequential or concurrent mode

With Go, this is really easy to do! We are going to use the two main concurrent tools of Go: the “wait groups” and the “channels”. Simply put, the former allow us to define a pool of processes to execute at the same time we can wait for before going on with the rest of the program, meaning that every process will run on its own but we do wait for all the processes to complete. The later is a way of storing results coming from various processes by using a queue to hold the results one after the other.

We will need to do 5 things:

  • create a channel (a queue) to store our results
  • call our data fetching function as a “goroutine” instead of a normal function
  • modify it a bit so that it makes use of the channel and stores enough metadata about the data source for further processes
  • handle the goroutines waiting with a WaitGroup
  • iterate over the channel to “unpack” our results and create the actual CountryDataset variables we need

First, let’s introduce a new struct called GatheredData that will hold the data we fetched and some interesting context values to restore later on; we will add more fields in the next article, but for now it is a very simple struct:

[snippet slug=072019_goweb-0 lang=golang]

Then, let’s create our channel that will be a collection of such GatheredData items in our serveIndex() function. We will need one buffer entry per url we want to scrap, so we create a channel with a fixed buffer size given by the number of urls to browse:

[snippet slug=072019_goweb-channel lang=golang]

Now, we can modify our getData() function to make use of this channel – notice that we change the prototype of the function and that instead of returning a slice of CountryData at the end like before, we now add the GatheredData variable directly in the channel:

[snippet slug=072019_goweb-channel2 lang=golang]

Finally, we can update our serveIndex() to call our getData() function as a goroutine (using the keyword go), and to then go through the queue to grab the results:

[snippet slug=072019_goweb-channel3 lang=golang]

Last but not least, we need to use a WaitGroup to make sure that the goroutines all start and finish properly. This requires four small changes in our code:

  1. we need to create our global WaitGroup variable
  2. in our getData() function, we need to add at the very top a defer line to alert the WaitGroup that when this function finishes, a goroutine is done and should be counted as such
  3. in our serveIndex() function, we need to tell the WaitGroup that whenever we start a new goroutine, the number of goroutines to wait for increases by 1
  4. also in the serveIndex() function, we need to wait for all the goroutines in the WaitGroup to actually finish before getting back the results

Here are the corresponding bits of code:

[snippet slug=072019_goweb-channel4 lang=golang]

Quite easy to do with Go, right? Now, it’s time to see the impact of these changes on our application. If you reload the page, you should see that it takes about half the time it did before to load the data.

Keep in mind that, while this does not make a huge difference in our current case, if we had many more URLs to browse, we would see it is way better to use the concurrent version!

Conclusion on the Go server & Next article teaser

Here we are: a first working version of our Go server! All in all, thanks to many tools from the language’s standard library, it is not that long of a code (less than 200 lines for a complete data fetching, processing and HTTP serving!). I haven’t written out all the extraction functions we should use for each URL in the aforementioned list but if you want, you can give a go – no pun intended! – at analyzing the HTML of those pages… or wait for the article next week that will end with the entire source code for the project.

Next time, we’ll improve the front-end and, in particular, we’ll implement a page to show the data with maps. We will also add some metadata in our CountryDataset and GatheredData structs to record things like the unit in which our value is expressed, or a specific postprocessing function to apply to our values (this will help us deal with very large range of numbers like the one we have for worldwide population, for example).

  1. Golang’s website: https://golang.org/
  2. Golang’s “A Tour of Go” tutorial: https://tour.golang.org/list
  3. d3.js’ website: https://d3js.org/
  4. npm‘s website: https://www.npmjs.com/
  5. Go’s standard library fmt package’s doc: https://golang.org/pkg/fmt/
  6. Go’s standard library html/template package’s doc: https://golang.org/pkg/html/template/
  7. htmlquery Go package’s Github: https://github.com/antchfx/htmlquery
  8. W3 Schools’ tutorial on XPath: https://www.w3schools.com/xml/xpath_intro.asp
  9. The Go Blog, “Go’s Declaration Syntax” (https://blog.golang.org/gos-declaration-syntax), July 2010. [Online; last access 27-July-2019].
  10. GoLang Tutorials, “Methods on structs” (http://golangtutorials.blogspot.com/2011/06/methods-on-structs.html), June 2011. [Online; last access 28-July-2019].
  11. The Go Blog, “Defer, Panic and Recover” (https://blog.golang.org/defer-panic-and-recover), August 2010. [Online; last access 27-July-2019].
  12. sentdex, “Go Language Programming Practical Basics Tutorial” (https://www.youtube.com/playlist?list=PLQVvvaa0QuDeF3hP0wQoSxpkqgRcgxMqX), Nov. 2017. [Online; last access 27-July-2019].
  13. Wiktionary, “Curly-bracket language” (https://en.wiktionary.org/wiki/curly-bracket_language#English), June 2017. [Online; last access 27-July-2019].
  14. Github’s thread on Go versioned package management: https://github.com/golang/proposal/blob/master/design/24301-versioned-go.md
  15. U. Hirawale, “Everything you need to know about Packages in Go” (https://medium.com/rungo/everything-you-need-to-know-about-packages-in-go-b8bac62b74cc), Jul. 2018. [Online; last access 27-July-2019].
  16. I. Wikimedia Foundation, “XPath” (https://en.wikipedia.org/wiki/XPath), July 2019. [Online; last access 27-July-2019].
  17. A. Edwards, “Serving Static Sites with Go” (https://www.alexedwards.net/blog/serving-static-sites-with-go), Jan. 2017. [Online; last access 27-July-2019].
  18. I. Wikimedia Foundation, “Web template system” (https://en.wikipedia.org/wiki/Web_template_system), July 2019. [Online; last access 27-July-2019].
  19. R. Pike, “Concurrency Is Not Parallelism” (https://vimeo.com/49718712), 2012. [Online; last access 27-July-2019].

Leave a Reply

Your email address will not be published.