Tuning the Gateware from Python

This section is about how to configure the gateware once the FPGA has been programmed. This means writing to a sequence of registers and BRAMs to:

  • Synchronize the logic
  • Select requantization depth (4-bit / 1-bit)
  • Set FFT parameters
  • Set the cross correlation parameters
  • Select frequency channels
  • Autotune the digital gain coefficients (4-bit mode only)

The Design Overview section documents how the gateware is designed in Simulink. If you are unfamiliar with gateware design or are finding this section confusing you may want to read that section before this one.

Reading and Writing to Programmable Registers

In Python the CasperFpga class allows you to interface with named FPGA registers. The gateware needs to be tuned and configured according to the user's needs, such as which channels to pick and which what bit mode to select. This configuration happens in five steps.

  • Setup. TODO
  • Set the channel order. Re-order the frequency channels so that the UDP payload packetizer selects the correct channels. For example, to select only channels 120:136 you must re-order the channels so that 120:136 occur at the beginning of each frame. For deeper explanation of why we need to do this see the packetizer section.
  • Optionally set the 4 bit coefficients.
  • Tune. TODO
  • Optionally update the 4 bit coefficients based on OBC data. TODO

Setup

TODO: write this section...

sparrow_albatros.AlbatrosDigitizer.setup()
Source code in software/sparrow_albatros.py
192
193
194
195
196
197
198
def setup(self):
    self.logger.info("Programming FPGA")
    self.program_fpga()
    fpga_clock_mhz = self.cfpga.estimate_fpga_clock()
    self.logger.info(f"Estimated FPGA clock is {fpga_clock_mhz:.2f}")
    self.logger.info("Initializing ADCs")
    self.initialize_adc()

Set channel order

TODO: write this section...

sparrow_albatros.AlbatrosDigitizer.set_channel_order(channels, bits)

Sets the firmware channels

Source code in software/sparrow_albatros.py
154
155
156
157
158
159
160
def set_channel_order(self, channels, bits):
    """Sets the firmware channels"""
    # hard coded names of brams
    if bits==1: channel_map="one_bit_reorder_map1"
    elif bits==4: channel_map="four_bit_reorder_map1" 
    else: raise ValueError(f"Bits must be 1 or 4, not {bits}")
    self.cfpga.write(channel_map, channels.astype(">H").tobytes(), offset=0)

Tuning registers

TODO: write this section...

sparrow_albatros.AlbatrosDigitizer.tune(ref_clock, fftshift, acc_len, dest_ip, dest_prt, spectra_per_packet, bytes_per_spectrum, bits, dest_mac=0)

This method "tunes" the FPGA Gateware's input registers to suit the user's needs.

  • Assumes fpga has been programmed and cfpga is running.
  • Sets values in FPGA's programmable Registers and BRAMs.
  • Basic sanity checks of FPGA output values, e.g. FFT overflows.
Parameters:
  • ref_clock (float) –

    Reference clock in MHz

  • fftshift (int) –

    FFT shift schedule. This int is re-interpreted as a sequence of bits, the 12 LSBs are used to define the shift schedule. Do 1/0 for on/off.*

  • acc_len (int) –

    Number of spectra accumulated to integrate correlations.*

  • dest_ip (str) –

    IP address to send packets to. The input is an IPV4 string following the convention "x.x.x.x", e.g. "192.168.0.1". This IP address is reinterpreted as an int and that value is written to a register on the FPGA.

  • dest_prt (int) –

    Destination port number.*

  • spectra_per_packet (int) –

    Number of spectra to include in each UDP packet.*

  • bytes_per_spectrum (int) –

    Number of bytes in one quantized spectrum.*

  • bits (int) –

    Number of bits per real/imaginary componant after requantization. Takes values 1 or 4.

  • dest_mac (int, default: 0 ) –

    Not yet implemented. Configure the destination MAC address.

* This parameter's value gets written to a register on the FPGA.

Source code in software/sparrow_albatros.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
def tune(self, ref_clock, fftshift, acc_len, dest_ip, 
        dest_prt, spectra_per_packet, bytes_per_spectrum,
        bits, dest_mac:int=0):
    """
    This method "tunes" the FPGA Gateware's input registers to suit the 
    user's needs.

    - Assumes fpga has been programmed and cfpga is running. 
    - Sets values in FPGA's programmable Registers and BRAMs. 
    - Basic sanity checks of FPGA output values, e.g. FFT overflows. 

    Args:
        ref_clock (float): Reference clock in MHz
        fftshift (int): FFT shift schedule. This int is re-interpreted as a 
            sequence of bits, the 12 LSBs are used to define the shift 
            schedule. Do 1/0 for on/off.\*
        acc_len (int): Number of spectra accumulated to integrate correlations.\* 
        dest_ip (str): IP address to send packets to. The input is an IPV4 
            string following the convention "x.x.x.x", e.g. "192.168.0.1". 
            This IP address is reinterpreted as an int and that value is 
            written to a register on the FPGA. 
        dest_prt (int): Destination port number.\*
        spectra_per_packet (int): Number of spectra to include in each UDP 
            packet.\*
        bytes_per_spectrum (int): Number of bytes in one quantized spectrum.\*
        bits (int): Number of bits per real/imaginary componant after 
            requantization. Takes values 1 or 4. 
        dest_mac (int): *Not yet implemented.* Configure the destination MAC 
            address. 

    \* This parameter's value gets written to a register on the FPGA.
    """
    MTU=1500 # max number of bytes in a packet
    assert spectra_per_packet < (1<<5), "spec-per-pack too large for slice, aborting"
    assert spectra_per_packet * bytes_per_spectrum <= MTU-8, "Packets too large, will cause fragmentation"
    assert bits in (1,4), f"Baseband requantization mode must be 1 or 4, not {bits}"
    # Assume bitstream already uploaded, data in self.cfpga
    # Assume ADCs already initialized including that get_system_information...
    # Inherit adc's logger level
    self.adc.ref = ref_clock # Set reference clock for ADC
    # ADC calibration assumed already aligned (?)
    # Need to set the ADC gain?
    # Get info from and set registers 
    self.logger.info(f"FPGA clock: {self.cfpga.estimate_fpga_clock():.2f}")
    self.logger.info(f"Set FFT shift schedule to {fftshift:b}")
    self.cfpga.registers.pfb_fft_shift.write_int(fftshift)
    self.logger.info(f"Set correlator accumulation length to {acc_len}")
    self.cfpga.registers.acc_len.write_int(acc_len)
    self.logger.info("Reset GBE (UDP packetizer)")
    self.cfpga.registers.gbe_rst.write_int(1)
    self.cfpga.registers.gbe_en.write_int(0)
    self.logger.info(f"Set spectra-per-packet to {spectra_per_packet}")
    self.cfpga.registers.packetiser_spectra_per_packet.write_int(spectra_per_packet)
    self.logger.info(f"Set bytes-per-spectrum to {bytes_per_spectrum}")
    self.cfpga.registers.packetiser_bytes_per_spectrum.write_int(bytes_per_spectrum)
    self.logger.info(f"Set quantization bit mode to {bits}-bits")
    if bits==1: self.cfpga.registers.sel.write_int(0)
    elif bits==4: self.cfpga.registers.sel.write_int(1)
    self.logger.info(f"NOT YET IMPLEMENTED: Setting destination MAC address to {dest_mac}")
    # TODO: set destination MAC address
    self.logger.info(f"Set destination IP address and port to {dest_ip}:{dest_prt}")
    self.cfpga.registers.dest_ip.write_int(str2ip(dest_ip))
    self.cfpga.registers.dest_prt.write_int(dest_prt)
    # Do we need to set mac address?
    #time.sleep(1.1) # dogmatically wait for regs to set before sending sync pulse
    self.sync_pulse()
    fft_of_count = self.cfpga.registers.fft_of_count.read_uint()
    if fft_of_count != 0:
        self.logger.warning(f"FFT overflowing: count={fft_of_count}")
    else:
        self.logger.info(f"No FFT overflows detected")
    self.logger.info("Enabling 1 GbE output")
    self.cfpga.registers.gbe_en.write_int(1)
    #self.logger.info("Leaving GBE reset high; pull it down manually once you think the negotiation has happened well!")
    self.cfpga.registers.gbe_rst.write_int(0)
    gbe_overflow = self.cfpga.registers.tx_of_cnt.read_uint()
    if gbe_overflow:
        self.logger.warning(f"GbE transmit overflowing: count={gbe_overflow}")
    else:
        self.logger.info("No GbE overflows detected")
    self.logger.info("Setup and tuning complete")
    return

Tuning the 4-bit digital gain coefficients (special case)

TODO: write this section... here's an outline:

  • To avoid non-converging, iterative hell of capturing 4 bit data and increasing or decreasing the digital gain coefficients on each channel (that may take forever to converge in the intermittent RFI case), we one-shot the 4-bit gain coefficients by getting a power reading from the on-board-correlator.
  • Tune gateware so that it's collecting data as you'd like it to
  • Wait for the on board correlator's accumulator to fill up
  • Read the auto-correlations power in each channel to estimate the optimal digital-gain coefficient needed
  • Write the set the 4-bit digital gain coefficients.

Now to actually implement this requires some book-keeping.

Bookkeeping with bit-growth

See also release notes of bit growth FFT implementation.

Fix24_23s come out of the FFT, real and complex are bussed together and interpreted as UFix_48s. The power block outputs UFix_42_40, which is a shifted squared value. We drop the six LSBs to make room for MSBs in the 64-bit accumulator.

image

The number of spectra accumulated in the auto-correlator is denoted len_acc or \(L\). The best estimate for the STD in each real and imaginary components is

\[ \sigma = \sqrt{\frac{P}{2 \cdot 2^40 \cdot L}} \]

where \(P\) is the power read from the autocorr BRAM interpreted as a UInt64. The four-bit digital gains must be set so that the quantization interval \(\Delta\) is equal to 0.293 of the STD (see the Optimal Digital Gains section). In this implementation, the quantization interval remains fixed at 1/8 and the digital gains modify the signals.

image

We must therefore multiply the signals by a gain factor, \(g\), such that

\[ g \cdot \sigma = \frac{1/8}{0.293} \Rightarrow g = \frac{1/8}{0.293 \cdot \sigma}. \]

This factor is written to BRAM as a UInt_32 but it's reinterpreted as a Fix_32_17, so we need to multiply \(g\) by \(2^{17}\) before writing it. The following is a code snippet for computing the optimal gain.

pol00 = read_pols(['pol00','pol11'])['pol00'] / (1<<40) # Each freq channel has different power
stds0 = np.sqrt(pol00 / (2 * acc_len))                  # ...and therefore a different STD
gs = (1/8) / (0.293 * stds0)                            # Calculate the digital gains
gs *= (1<<17)                                           # Multiply by 2^17 for packaging
gs[np.where(gs > (1<<31)-1)] = (1<<31)-1                # Clip the gains so that they fit
# Now the `gs` array is ready for writing to BRAM

...as implemented in

sparrow_albatros.AlbatrosDigitizer.get_optimal_coeffs_from_acc(chans)

Reads accumulator to set 4-bit digital gain coefficients

Assumes fpga is setup and well tuned.

self : AlbatrosDigitizer object for reading the acc chans : numpy integer array can be used to index accumulator pols

Source code in software/sparrow_albatros.py
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
def get_optimal_coeffs_from_acc(self, chans):
    """Reads accumulator to set 4-bit digital gain coefficients

    Assumes fpga is setup and well tuned.

    self : AlbatrosDigitizer object for reading the acc
    chans : numpy integer array can be used to index accumulator pols"""
    # for the same channel, we want to apply same digital gain to each pol
    quant4_delta = 1/8  # 0.125 is the quantization delta for 4-bit signed 4_3 as on fpga
                        # clips at plus/minus 0.875
    quant4_optimal = 0.293 # optimal 15-level quantization delta for gaussian with std=1
    _pols = self.read_pols(['pol00','pol11'])
    acc_len = self.cfpga.registers.acc_len.read_uint() # number of spectra to accumulate
    OPT_COEFF_MODE="bitgrowth"
    if OPT_COEFF_MODE=="legacy":
        # these are read as int64 but they are infact 64_35 for autocorr and 64_34 for xcorr
        pol00,pol11 = _pols['pol00'] / (1<<35), _pols['pol11'] / (1<<35) 
        pol00_stds = np.sqrt(pol00 / (2*acc_len)) # stds of re or imaginary parts
        pol11_stds = np.sqrt(pol11 / (2*acc_len)) 
        coeffs_pol0 = np.zeros(2048) # hard coded num of chans as 2048
        coeffs_pol1 = np.zeros(2048) # hard coded num of chans as 2048
        coeffs_pol0[chans] = quant4_delta / (pol00_stds[chans] * quant4_optimal)
        coeffs_pol1[chans] = quant4_delta / (pol11_stds[chans] * quant4_optimal)
        coeffs_pol0[chans] *= (1<<17) # bram is re-interpreted as ufix 32_17
        coeffs_pol1[chans] *= (1<<17) # bram is re-interpreted as ufix 32_17
    elif OPT_COEFF_MODE=="bitgrowth":
        pol00,pol11 = _pols['pol00'] / (1<<40), _pols['pol11'] / (1<<40)
        pol00_stds = np.sqrt(pol00 / (2*acc_len)) 
        pol11_stds = np.sqrt(pol11 / (2*acc_len))
        g0 = (1/8) / (0.293 * pol00_stds) # optimal gain coefficients
        g1 = (1/8) / (0.293 * pol11_stds) # optimal gain coefficients
        coeffs_pol0 = np.zeros(2048) # hard coded num of chans as 2048
        coeffs_pol1 = np.zeros(2048) # hard coded num of chans as 2048
        coeffs_pol0[chans] = g0[chans] * (1<<15) * 1.414 # fudge factor sqrt 2
        coeffs_pol1[chans] = g1[chans] * (1<<15) * 1.414 # fudge factor sqrt 2 
    # not sure where missing factor of two comes from 
    # sets stds to roughly 2.83 [plus-minus systematic 0.05])
    coeffs_pol0[coeffs_pol0 > (1<<31)-1] = (1<<31)-1 # clip coeffs at max signed-int value
    coeffs_pol1[coeffs_pol1 > (1<<31)-1] = (1<<31)-1 # clip coeffs at max signed-int value
    coeffs_pol0 = np.array(coeffs_pol0 + 0.5, dtype='>I')
    coeffs_pol1 = np.array(coeffs_pol1 + 0.5, dtype='>I')
    return coeffs_pol0,coeffs_pol1

The communication stack

You may be wondering how Python is able to read from and write to FPGA registers.