Search Microcontrollers

Sunday, September 21, 2014

FPGA - Using RAM Memory - M9K Blocks 1

We all know that MCUs are easier than normal CPUs because pretty much all you need for an application is "included", particularly a certain amount of RAM.

FPGAs are no different, in fact most of them include a number of ram blocks (called M9K in Altera Cyclone IV devices, or M4K in Cyclone II ones... not sure about the other families).

(available memory blocks by family - From Altera's documentation)


If you want to implement a CPU within your FPGA (typically a NIOS II, but any CPU would have the same requirement), you need some ram to store the code to be executed and eventual data.

Since FPGAs have a high count of I/O pins, you can dedicate a few of them to interface an external memory chip or you can use the internal one.

Altera M9K memory blocks are 9KBit memory blocks that can be organized in different ways (i.e. in 9 bit words x 1K cells).
The motivating reason to have 9 bits instead of 8 is to have an additional bit for parity control.

Different devices have a different amount of M9K blocks, the following table lists the specs of the Cyclone IVE family.

  ( from Altera's website )

One interesting feature of these memory blocks is that they are Dual Ported.
This means that they can be configured in a way that allows reading and writing from different addresses and data busses.
Does it matter?
I guess sometimes it does, think about a VGA interface, like the one we described in the previous post.

The VGA implementation we did, did not use memory, it was simply calculating the RGB components based on the X/Y coordinates, using a binary formula.
For practical uses you will hardly use that approach, instead you will probably have a process that writes data in a memory bank where memory locations represent pixels and all together form an image.
Another, concurrent, process will read sequentially the memory bank and use the content to drive the color components.
Technically this means one process will write in some random location L(x1,y1) while another will read from L(x2,y2) at the same time.
If the two processes can use separate ports, the problem is solved!

Note : I cannot compare the quality of Altera's documentation with the one provided by competitors since I only used Altera's when dealing with FPGAs, I can only say it is GREAT.
There is a ton of useful documentation, examples, tutorials, online training etc available on the Altera website.

Here I found an example implementation of a true dual port memory block :

module true_dp_ram (
address_a,
address_b,
clock,
data_a,
data_b,
rden_a,
rden_b,
wren_a,
wren_b,
q_a,
q_b);

input [3:0]  address_a;
input [3:0]  address_b;
input clock;
input [12:0]  data_a;
input [12:0]  data_b;
input rden_a;
input rden_b;
input wren_a;
input wren_b;
output [12:0]  q_a;
output [12:0]  q_b;

[...]

As you can see this example implements two 4 bits addresses, two separate 13 bits inputs and two 13 bits outputs.
Additional inputs are the clock (RAM needs one) and the read enable / write enable for each port.
How cool is that?!!

Obviously you can trim down your requirements and work with a simpler 1 Port interface if this is ok with your design.

Once you create a memory block, you can specify an initialization file in Quartus II, meaning that your memory could be populated at reset.
Problem is that to actually test if the memory works, we need to provide some kind of output, so we will use in our first example just a few LEDs which will represent the bits of the memory locations we will scan (slowly, so we can see them change).

A common way to provide init data is through a MIF file, follow this link to see the specs.

The MegaWizard plugin manager comes handy here :


A simple 1 Port ram here, for this test


I am planning to use 4 LEDs to output the values, so it is handy to have 4 bits wide words, I chose an arbitrary length of 32, should be more than enough to verify my pattern.


Finally I created a MIF file and loaded it here (it is possible to update it later, even without recompiling the project).

The result is a new module which looks like this :

module ram_module (
address,
clock,
data,
wren,
q);

input [4:0]  address;
input clock;
input [3:0]  data;
input wren;
output [3:0]  q;
[...]

As you can see it created a 5 bit address (we have 32 locations), and two registers for input and output, 4 bit wide each as requested.
Wren will be used to write data, meaning that for this initial test it will not be used at all (so ROM memory would have produced the same result after all :) )

Ok, so now we need a top module that cycles the ram buffer and outputs the content to the leds.

module m9ktest_top(leds ,clock, reset );

output reg[3:0] leds;
input clock;
input reset;

reg  write;
reg [4:0] addr;
reg [3:0] datain;
wire [3:0] dataout;
reg ramclk;


// instantiate and connect the ram buffer

ram_module buffer
(
.address(addr),
.clock(ramclk),
.data(datain),
.wren(write),
.q(dataout)
);

reg [31:0] counter; //24 bits would have been enoguh
wire CounterMaxed = (counter==32'hFFFFFF);

always @ (posedge clock or negedge reset)// on positive clock edge
begin 
 if (!reset)
 begin // initial conditions
  addr <= 5'b0;
  ramclk <= 1'b0;
  counter <= 32'h0;
  write <= 1'b0;
end else if(CounterMaxed)
begin
 counter <= 32'b0;
 addr <= addr + 5'b1; // set address
 write <= 1'b0;    // disable write
 ramclk <= 1'b1;   // clock the ram
 leds <= dataout;  // read ram location 
 if (addr == 5'b11111)
  addr <= #1 5'b0;
 end
else
 begin 
// just delay, my eyes are not fast enough for MHz range!
  ramclk <= 1'b0;
  counter <= counter + 32'b1;// increment counter
 end
end 
endmodule

I know, it is not really elegant and I am sure it can be coded much better, bear with me, this is still one of my first Verilog experiments.
If you have suggestion / corrections, tho, make sure to post them in the comments!

So, what to say about the code?
It is pretty self explanatory, actually the particular thing is that I am clocking the ram only when I need to read it (is it ok? Best practices? not sure there)
You will notice there is a single always block that is sensitive to both reset and main clock, this is because in Verilog you cannot manipulate the same signal from two different processes.
Makes sens if you think about it : altering the same signal in two independent processes would result in unpredictable results, after all.

Ah, it works by the way :)
Honestly I was really surprised it did work immediately, first attempt... guess I just got lucky, but that wouldn't  have been possible without the great docs I already mentioned in this post, kudos to Altera for that one.




No comments: