<< Previous Contents Index Next >>

5. Data sources

Data sources are defined by the source name ... end source block. A source block describes the type of source, whether and how data should be collected from it and other information associated with it.

A source only becomes a collector when it has a collect statement; a source can contain, for example, router authentication information for route collection, even if the router is not used for collecting flows.

Sources are processed, by default, every ten minutes. This can be modified using the frequency statement.

5.1. General parameters

5.1.1. Ring buffers and daemons

The NetFlow and BPF interfaces of ipacc collect data in real time. Flow data from these interfaces is written into a "ring buffer", from which the charging processes can then read current data. The daemon writes flow information from the beginning of the buffer, then when it reaches the end it returns to the beginning again, overwriting the oldest data in the buffer.

Ring buffers are memory mapped and of fixed size. It must therefore be large enough to contain peak flows between processing runs. However, it should if possible be small enough that it can be contained entirely inside memory. The maximum size is two gigabytes.

The ring buffer size is controlled by the ring-size size statement. size specifies the size of the buffer in bytes, and is specified as a quantity specification.

Collectors writing to ring buffers may be processed on different hosts to the actual collector. This allows for configurations where all processing is on a central host, while collection runs on smaller hosts co-located with the actual data collection point.

Note that collector ring buffers are created according to the ring-size parameter only when they are first created. If the ring must be resized after initial creation, the ipacc collector must be stopped, the ring buffer file (named data/source-name.ring in the ipacc run directory) deleted and the ipacc collector restarted.

The collect all statement controls whether the collector daemon should check if a packet or flow is going to be chargeable before inserting it into the ring buffer. If collect all is specified, all flows are are written to the buffer, and the topology database is not parsed. If the majority of flows passing through the collector will be charged, collect all will greatly reduce the load on the collector daemon. If only a minority of flows will be charged, and collect all is not specified, a smaller ring buffer can be used and the subsequent processing of the collected flows may be more efficient.

Note that the collector daemons do not consult the dynamic IP address assignment files, so if dynamic address assignment is used and collect all is not specified, the configuration must be arranged so that IP addresses in the dynamic blocks are considered "chargeable" without a dynamic assignment.

5.1.2. Placing source in topology

The select statement is used to select which networks a given source is collecting data from. This is important in configurations where chargeable traffic may pass through two collectors, and therefore be counted twice. For example, if there are two sites North and South, independently connected to the Internet, but also connected to each other, collection must take place at both sites, or Internet traffic data will bot be collected. But if collection is done at both sites, traffic between hosts will be counted at both locations.

The solution is to use the select group ... statement. This marks a source as only being "responsible" for hosts in (or below) a particular group in the topology. Traffic not relating to the selected group is ignored. Traffic relating only to the selected group is counted in full, and traffic relating to the selected group but also relating to groups selected by other sources is counted in the inbound direction only. For example:

In this example, traffic from 192.168.1.1 (in the north block) to 192.168.2.2 (in the south block) will be counted by the south source, being inbound at that point, while the reverse traffic will be counted by the north source. Note that in this case both addresses will be charged for traffic in both directions; the select statement is simply to prevent traffic being counted multiple times when is passes through multiple collectors.

If double charging is not desired, the charging statements should be amended to charge traffic that could be charged at both ends in one direction only, e.g.

In this case, only inbound national traffic is charged, on the basis that each end of the connection is charged to one customer or the other, while other traffic is charged in both directions.

5.1.3. Router login parameters

ipacc supports connection to the management interface of Cisco routers or Quagga processes for the purposes of collecting routing information, and collecting data using the Cisco ip accounting interface. To support this, the source must include the address of the router, its login user (if required), login password, and for IP accounting, the "enable" password.

The router address address statement specifies the host name or IP address of the router or device to connect to.

Cisco passwords can be stored in clear text or using Cisco's "type 7" encoding, compatible with the format emitted in saved Cisco configuration files when service password-encryption is used. For example, the following statements encode the same password:

"Type 7" passwords can be created using the ipacc-password program. Note that since "type 7" encoded passwords contains a space, they must be quoted.

WARNING: encoded passwords are not encrypted. A simple, reversible algorithm is used to reduce the risk of accidental visual disclosure of passwords to unauthorised persons when viewing configurations containing them. Configuration files containing encoded passwords must be protected against unauthorised access.

The router username username statement specifies a username for logging into a router. Note that the standard authentication does not use a username; this should only be used if separate users are set up on the router.

The router password password statement specifies the password to log into the router. This is required for the both route collection and ip accounting processes.

The router enable password statement specifies the password to gain privileged access to the router (the "enable" password). This is only required if the ip accounting interface is used.

The timeout when reading data from a router can be set with the router timeout timeout statement. The timeout parameter is the number of seconds to wait when expecting data before failing the connection, and is treated as an interval specification.

5.1.4. Aging collected data

The keep time statement takes an interval value, and determines how long raw data should be kept on disk before being removed. Ideally, data should be kept for long enough to allow for outages in the processing of collected data, not merely how frequently that data is collected.

5.1.5. Debugging information

The flow-file filename statement can be used to obtain a dump of all flows processed by the processing runs. filename may contain strftime(3) arguments. The format of the flow file is:

All fields are separated by tabs. src is the name of the source that collected the flow; start and end are the start and end times of the flow; src-ip and dst-ip are the source and destination IP addresses; prot is the protocol (e.g. 6 for TCP; 17 of UDP); sp and dp are the source and destination port numbers for TCP and UDP flows; tos is the type of service field; and pkts and bytes are the packet and byte counts for the flow.

The dump-file filename statement causes the internal database of ipacc's engine to be dumped in a human readable form after reading the configuration but before processing. filename may contain strftime(3) arguments. Note that a dump file will overwrite an existing file of the same name.

Collectors can be made to regularly dump statistics as an informational message, using the statistics-interval interval statement. The message written to the log is in the form:

where source is the source being collected, time is the number of seconds the statistics line represents, flows is the number of flows collected, and packets and bytes are the number of packets and bytes those flows represent. Counters are zeroed after each statistics report.

5.1.6. Dynamic addressing

The dynamic-file filename [backups] statement specifies the file name(s) where dynamic address information can be found. filename specifies the file to read. backups, if specified, defines the number of backup files, named filename.1, filename.2 and so-on up to the specified number.

Dynamic ip addressing is discussed in more detail in the section on Dynamic addressing under Advanced topics, including the format of the dynamic addressing files.


<< Previous Contents Index Next >>