Linux Programming Blog Entry 2 - Threads and Sockets: jevin

jevin

Linux Programming Blog Entry 2 - Threads and Sockets

Dec 20, 2010 20:03

Sockets are used to interface with networks on both UNIX and WINDOWS based systems. I'll be focusing on UNIX's sockets. Some of these functions are more useful for clients (depending on the responsibilities of the client) and other are more useful for servers (Again, it's variable). I'll try to denote which are useful for one or the other in a SIMPLISTIC server/client model.

This first one is a good one for clients, it allows you to use gethostbyname() to resolve a host name and get an IP address back.

struct hostent *gethostbyname(char *hostname);

struct hostent{
char * hostname //Name of the host
char ** host_aliases //list of aliases
int host_address_type //type of the address
int host_address_Length //length of the address
char ** host_address_list //list of addresses from name server
};

We need to know how to form addresses. There's a structure defined for this:

struct sockaddr in
{
short int sin_family //address family
unsigned short sin_port //Port number
struct in_addr sin_addr //internet address
};

To create a socket you will need to use a system call called 'socket'. The function header for this is:

int socket(int addr_family, int socket_type, int protofamily);

The address family is the type of address we want the socket to use.
The socket type generally refers to 'stream sockets', or 'datagram sockets'.

Streams form a connection between points over the internet and transfer data between them, guaranteeing that packets make it to the other side and that they make it there in order from the first to the last packet.
Datagrams are the opposite, they may arrive unordered or, perhaps, not at all.

The protofamily tells us which protocol we will use (TCP or UDP) and leaving it at 0 will let the system decide.

Once you've created a socket, you will be returned a file descriptor (Which can be used with system calls like write() and read()).

So once we've created a socket, we need to bind() it to both the local address and the foreign address. The function header for bind is:

int bind(int socket, struct sockaddr * addr, int addrlen);

(sockaddr is a generalized type of sockaddr_in. sockaddr_in can be cast into a sock addr type)

There are a few different types of bind:

The first lets you bind the socket to a specific IP address and port.

The second binds to a specific IP address but allows the system to choose a port

The third allows the specification of a port but the IP address is variable.

The forth allows the use of any local IP address and any port.

Unix's implementation of pthreads is, of course, to support parallel processing.

pthread_create() creates a new thread and takes the following data:

pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void*), void *arg)

Here are the arguments listed out in order:

1. A pointer to a pthread_t variable, in which the thread ID of the new thread is
stored.
2. A pointer to a thread attribute object.
3. A pointer to the thread function. This is the function the thread will execute. It takes a function pointer that has a void* argument and a void* return type.
4. A thread argument value of type void*.

After you call pthread_create, the current thread (the main thread, usually) continues execution as normal while the thread that was created goes into the function you gave it (The function with the void* return)

If you want to pass data to threads you can use the 4th argument. You can't pass a lot of data but you can pass a pointer to the base address of an array or a structure.

The second argument refers to thread attributes. A thread attribute allows a programmer a slightly more fine-grained control over the behavior of a thread. To give a thread an attribute, you delcare a pthread_attr_t object, call pthread_attr_init to initialize the attribute, and then modify the attribute object to contain whatever you wanted it to contain.

In pthread_create you can pass this attribute object (in the second argument) to modify to thread's behavior. An example of an attribute one might modify is the thread's 'detached' state. A detached thread is cleaned up automatically when it terminates unlike other threads which don't terminate until you pthread_join or cancel the thread.

Another nifty function is pthread_join(name_of_thread_to_wait_for, optional_exit_status). With pthread_join you're able to tell threads to wait on other threads to terminate before they themselves terminate. This comes in handy for making sure that memory addresses passed to the thread don't become deallocated in the middle of processing.

These are a few basic thread functions, there are plenty more (and many problems to consider when using threads) but I might have to wait to make a post about those as I'm getting a bit lazy. Here's a list of a few things:

Race conditions
Mutex locks
Critical sections
Semaphores
Signals+threads and how they're handled

I will end by noting that threads and forking processes can be used for many of the same things but have differences that make them both worth considering when building a software package:

Threads have to run the same executable that they're created in. Child processes have exec() and can, therefore, run in separate executables.

Threads that are created share the same memory space, file descriptors, etc. So if one thread modifies a location in memory the change is visible to other threads. Threads, however, have their own call stacks.

Copying memory for a new processes adds performance overhead. However, with Copy on write, copying only occurs if memory is being written to so weighing the trade-off here would be necessary.

Threads are better for fine-grained control over parallelism. They allow you to parallelize functions, modules, etc and because they share memory you get to avoid any copy overhead that forking a child might incur. Processes have a more coarse-grained control.

Please note that, if something isn't clear in any of these posts (or if you find a mistake) let me know. These are primarily to help me remember/learn things I've not seen in a while and, I hate to admit it, but I am not the best at revising these as I rarely feel like re-reading them over and checking my data against my reading material...