Distributed Systems
Interprocess Communication
Interprocess Communication (IPC)
- Middleware
- low layer: supports basic IPC
- next layer: high level communication paradigm RMI, RPC
Overview
Java APIs for Internet Protocols
- UDP
- message passing abstraction
- processes transmit a single datagram to a receiving process
- best effort
- no guarantees
- TCP
- abstraction of 2-way stream
- streams have no message boundaries
- basis of producer/consumer communication
- transparent recovery
- higher overhead than UDP
- reliable
- if connection fails, exception is produced
Data Representation
- how objects/data are translated into suitable form for sending as messages over
network
- receiver needs to be able to decode what it receives
Higher level Protocols
- request-reply protocols: client-server
- group multicast protocol: group communication
API for IP
- processes use two message communication functions:
send
,receive
- queue associated with each message destination
- receive side: OS is producer, process is consumer
- synchronous communication: both
send
andreceive
are blocking- when a
send
is issued, sending process is blocked untilreceive
is issued - when a
receive
is issued, process blocks until a message arrives
- when a
- asynchronous communication:
send
is non-blocking- sending process returns as soon as the message is copied to a local buffer
- transmission of the message proceeds in parallel
receive
usually blocking, but can be non-blocking
- non-blocking
receive
: provides buffer to be filled in the background- needs an interrupt/polling to be notified when the buffer is filled
- may be more efficient, requires more complex code to acquire incoming message
- blocking
receive
: when you can have multiple threads in a single process (e.g. in Java), there are no disadvantages, as one thread can issue the blocking call while other threads remain active
Communication | send |
receive |
---|---|---|
Synchronous | blocking | blocking |
Asynchronous | non-blocking | blocking (usually) |
- producer/consumer: linked blocking queue
- producer:
offer()
: look at queue - if full, returns False and doesn’t add to the queue- otherwise adds data to the queue
- non-blocking
put()
: blocks until it can be put in the queue- causes context switch
- consumer:
take()
: blocks until there’s something to take from the queue- causes context switch
peek()
: non-blocking look at the first element without removing itpoll()
: returnsnull
if empty, or 1st item from queue (removing it)- non-blocking
- NB: different to synchronous protocol: send a message and don’t do anything else until reply received (doesn’t mean thread is blocked)
- Node uses non-blocking calls and is single-threaded
Sockets
- socket: provides end point for communication between processes
- to receive messages, its socket must be bound to a local port on one of the Internet addresses of the host
- same socket can be used for both sending/receiving
- each socket is associated with single protocol: TCP/UDP
Java Internet Address
IntAddress
: class encapsulating Internet address- call
getByName
to get an instance - throws
UnknownHostException
IntAddress aComputer = IntAddress.getByName("registermachine.com")
UDP datagram Communication
- server (receiver) binds its socket to a server port (known to the client)
- client (sender) binds socket to any free port
- receive method returns Internet address/port of the sender with the message
- this allows replies to be sent
- message size
- receiving process defines array of bytes to receive message
- if too big message is truncated
- practical limit 8kB
- protocol allows packets up to $2^16$ bytes
- barebones: low overhead
- e.g. DNS, VoIP
Blocking
- non-blocking
send
s - blocking
receive
s - message delivered to message buffer of socket bound to the destination port
- invocations of receive on the socket collect the messages
- messages discarded if no socket bound to the port
Timeouts
receive
waits indefinitely until messages received- can set timeouts on sockets to exit from infinite waits and check condition of
sender e.g.
Thread.interrupt()
receive
allows receiving from any port- can be restricted to given IP addr/port
Possible failures
- data corruption: detected with checksum
- omission failures: buffers full, corruption, dropping
- order: messages may be delivered out of order
Java API
DataGramPacket
- 2 constructors for sending or for receiving
getData()
getPort()
getAddress()
DatagramSocket
- constructors: port number/no argument
send()
receive()
setSoTimeout()
connect()
- see textbook for client/server e.g.
TCP Stream Communication
- message sizes: no limit on data size
- lost messages: acknowledgement scheme retransmits unacknowledged packets
- flow control: receive window; match speed between sender/receiver
- congestion control: prevent congestion collapse of network
- duplication/ordering: sequence numbers ensure duplicates are rejected and reordering occurs as necessary
- destinations: connection established before communication
- e.g. HTTP, FTP, Telnet, SMTP
Establishing TCP stream socket
- client:
- create socket with server address + port
- read/write data using stream associated with socket
- server:
- create listening socket bound to server port
- wait for clients to request connection: listening socket maintains a queue of incoming connection requests
- server accepts a connection and creates new stream socket for the server to communicate with the client
- pair of sockets (client/server) now connected by pair of streams, one in each direction. A socket has an input stream and an output stream
Closing a socket
- data in output buffer sent to other end with indication stream is broken
- no further communication possible
Issues
- need pre-agreed format for data sent
- blocking is possible at both ends
- if the process supports threads, best approach is to assign a thread to each connection so that other clients are not blocked
Failure model
- checksum: detect/reject corrupt packets
- sequence number: detect/reject duplicates
- timeout + retransmission: lost packets
- severe congestion: TCP streams declare connection broken
- breaks reliable communication
- communication broken: processes cannot distinguish between process failure and process crash
- communicating processes cannot definitely say whether messages sent recently were received
- clean exit: very confident all data received correctly
Java API
ServerSocket
- used to create a listening socket
accept()
: gets connect request from queue, returnsSocket
instanceaccept()
: blocks until connection arrives
Socket
- used by pair of processes with a connection
- client: uses constructor specifying DNS hostname:port, creating a socket bound to a local port and connects to remote computer
getInputStream()
getOutputStream()
- see textbook for TCP client/server
External Data Representation and Marshalling
- files are either binary or text format
- CORBA, Java’s object serialisation: binary
- binary file formats
- faster,
- more flexible,
- uses less memory
- texted based file formats:
- easy to interpret
- application specific tags can be constructed and parsed with external parsers/libraries
- data structures need to be flattened to a sequence of bytes for transmission or storage
- approaches to allow computers to interpret data
- use agreed external format
- transmit in senders format, with indication of format used
- external data representation: agreed standard for representing data structures
and primitive data
- CORBA common data representation
- Java serialization
- JSON
- XML
- marshalling: process of converting data to form suitable for transmission
- unmarshalling: disassembling data at receiver
- lots of validation required to ensure it conforms to expected format
CORBA’s Common Data Representation
- 15 primitive data types: short, long, unsigned short, …, float, double, char, boolean, octet, any
- primitives can be sent in big-endian/little-endian orderings
- values are sent in sender’s ordering, which is specified in the message
- marshalled data only includes values of objects transmitted, not information concerning its type: common knowledge at sender/receiver about types of data items in the message
- constructed types: primitive types combined in order
- marshalling: performed by middleware
- operations can be automatically generated from data type specification defined in CORBA interface definition language (IDL)
- CORBA interface compiler generates marshalling/unmarshalling operations
Java Object serialization
- serialisation: convert an object into a byte stream for storage/transmission
- deserialisation: restoring that state of an object from a serialised form
- information about the class is included in the serialization (name, version)
- all objects it references are serialized with it
- the byte stream created is platform independent
- references are serialised as handles
- contents of primitive instance variables that are primitive types are written in a portable
format using portable format using
ObjectOutputStream
methods- Strings/characters written using
writeUTF()
- Strings/characters written using
public class Person implements Serializable {...}
- anonymous functions aren’t usually serialisable
- can be very inefficient
- can also implement
Externalizable
- programmer needs to implement flattening methods
- potentially much more efficient
- during RMI: arguments are results are serialized/deserialized by middleware
- reflection: permits automatic de-/serialization
transient
: Java won’t transmit that variable
XML Extensible Markup Language
- markup language: textual encoding representing data and details of the structure/appearance
- XML
- markup language defined by World Wide Web Consortium (W3C)
- tags describe logical structure
- extensible: additional tags can be defined
- tags are generic; c.f. HTML, where tags give display instructions
- self-describing: tags describe the data
- textual: human-readable, platform independent
- textual: messages are large, so lots of processing, storage, transmission time
- SOAP: XML format whose tags are published for use by web services and their clients
<person id="123456789">
<name>Smith</name>
<place>London</place>
<year>1934</year>
<!-- comment -->
</person>
Elements, attributes
- elements: data surrounded by tags e.g.
<name>Smith</name>
- hierarchical representation via nesting of elements
- empty tag with no contents terminated with
/>
e.g.
- attributes: start tag optionally contains attributes: name + value
- e.g.
id="123456789"
- e.g.
- either can be used to represent data
- substructures can only be represented with elements
- attributes can only be used for simple data types
Namespace
- namespace: set of names for a collection of element types and attributes
- referenced by a URL
- can be specified with attribute
xmlns
with value of the URL for the file containing namespace definition- e.g.
xlmns:pers = "http://abc.def/person"
- e.g.
<person pers:id="123456789" xlmns:pers ="http://abc.def/person">
<pers:name>Smith</pers:name>
<pers:place>London</pers:place>
<pers:year>1934</pers:year>
<!-- comment -->
</person>
Schema
- XML schema: defines elements/attributes that can appear in a document
- how elements are nested
- order/number of elements
- whether an element is empty/can include text
- intention: single schema definition shared by many documents ```xml
```
JSON JavaScript Object Notation
- becoming dominant format used today
- supplanting XML
- MongoDB uses JSON derivative
- lightweight
- text-based
- easy to program with
- easy to understand
-
easy to parse
- syntax diagrams at www.json.org
Parsing Numbers
- there is no limit to the number of digits: this makes it difficult to represent (e.g. in a database). What storage class should you use for this?
- e.g. Twitter ran out of digits for id’s, so added a string version of id
- MongoDB created BSON (Binary JSON) to address numeric issues
- this can sometimes break downstream processing that expects vanilla JSON
- can lock you in to proprietary formats (technology lock-in)
JSON vs XML
- JSON
- lightweight: fewer bytes to represent the same information, reducing memory use and data transmission
- supports arrays
- XML
- has attributes for metadata within tags. To do this in JSON requires adding an extra field
Group Communication
- multicast operation allows group communication
- send single message to a number of processes, identified as a group
- can happen with/without delivery guarantees
- one packet is sent out of 1 computer, and is received by multiple parties
- routers are responsible for routing multiple copies
Uses
- propagation of event notification: e.g. pub-sub, Facebook. When a status changes, all friends receive notification
- fault tolerance when used with replicated services: client requests get multicast to all members of the group
- even when some members fail, the client can still be served
- discovering services: multicast used by servers/clients to locate discovery services to register interfaces etc.
- better performance through replicated data: data are replicated to increase performance
- updated data are multicast to processes managing replicas
IP Multicast: Java API
MulticastSocket
: subclass ofDatagramSocket
joinGroup()
leaveGroup()
- IP multicast: built on top of IP
- lets sender transmit a single packet to a set of computers forming a group
- sender unaware of individual recipients, only group
- group identified by class D IP address (224.x.x.x)
- router may not be configured to allow multicast
- while you may be able to use it locally, ISPs do not allow it
API
- IP multicast only available via UDP
- application can send UDP datagrams to multicast address and ordinary port numbers
- application can join a multicast group my making its socket join the group
-
when multicast message reaches a computer, copies are forwarded to all processes with sockets bound to the multicast address + port number
- see Textbook for multicast peer
Failure model
- datagrams multicast over IP multicast suffer from omission failures
- unreliable multicast: not guaranteed to be delivered to any particular group member
- messages may not get to 1+ members due to a single omission (i.e. some, not all members receive it)
- unreliable multicast: not guaranteed to be delivered to any particular group member
Overlay Networks: Network virtualisation
- virtual networks can be constructed on top of an existing network (e.g. the Internet)
- these can be tailored to meet to needs of a particular distributed system
- overlay network: virtual network of nodes and virtual links sitting on top of an underlying network (e.g. IP network)
- tailored service for needs of application
- more efficient operation in a particular networked environment
- additional features e.g. multicast/secure communication
- overlays are layer sitting outside standard architecture, allowing degrees of freedom to be exploited
Advantages
- new network services can be defined without changing underlying network
- encourage experimentation to drive innovation
- multiple overlays can coexist: more open, extensible network architecture
Disadvantages
- additional indirection: performance penalty
- additional complexity of network services
Types
Skype
- Skype is impure P2P application for VoIP
- developed by Kazaa - similar to Kazaa P2P filesharing application
- virtual network: establishes connections between people
- no IP address/port required to establish a call
- architecture: P2P infrastructure of ordinary users’ machines (hosts) and super nodes
- super nodes: ordinary Skype hosts with sufficient capabilities to carry out enhanced role
- selected based on demand, based on available bandwidth, reachability, availability
- super nodes: ordinary Skype hosts with sufficient capabilities to carry out enhanced role
- user connection:
- authentication via well-known login server
- make contact with a selected super node (IP addr:port is stored in client cache)
- search for users:
- super nodes: perform efficient search of global index of users
- search orchestrated by client’s chosen super node
- expanding search until specified user found
- on average 8 super nodes are contacted, 3-4s
- voice connection: once user is discovered, Skype establishes voice connection between two parties
- TCP: signal call requests/terminations
- UDP/TCP for streaming audio. TCP sometimes required to get around firewalls
Properties
- diameter/depth: shortest path between any two nodes
- affects latency
- degree: sum of in and out degree
- affects bandwidth consumption
- scalability bottlenecks: tailor depth/degree to reduce latency and bandwidth consumption
Various Configurations
- latency/diameter $O(1)$, bandwidth/degree $O(n)$
- every node connects to every other node, degree of all nodes is $n-1$
- diameter $O(n)$, degree $O(1)$
- chaining
- diameter $O(\log{n})$, degree $O(1)$
- $n$-ary Tree
- diameter $O(1)$, degree $O(\sqrt{n})$