Multi-core DSP multi-channel synchronous clock signal design analysis

Filter 18.432M
NFM18PC105R0J3D MURATA Murata filter original spot

Abstract: Multi-core digital signal processors (DSPs) are equipped with a variety of peripheral interfaces, each requiring its own independent reference clock. Due to the high-speed data processing capabilities of multi-core DSPs, the clock requirements for these interfaces are stringent. When multiple interfaces operate simultaneously, precise clock synchronization becomes essential. This paper explores the clock design of the multi-core DSP Ding MS320C6678. The CDCM6208 clock chip is used to provide multiple clock signals at different frequencies to the DSP. The paper details the initialization and configuration of the clock chip, as well as the software and hardware design methods involved.

Introduction

Multi-core processors have become a key trend in modern electronics. These devices integrate multiple similar or heterogeneous processors on a single chip, significantly enhancing computational power. Digital Signal Processors (DSPs) are widely used due to their superior performance in digital signal processing. As an example of high-performance multi-core DSPs, the TMS320C6678 contains eight cores that can operate independently or in parallel. When working together, these cores require efficient communication and handshake mechanisms, especially in time-sensitive applications where clock stability and synchronization are critical. Ensuring stable and synchronized clocks places high demands on system design, including clock sources, distribution, PCB layout, and shielding techniques.

This article discusses the use of the CDCM6208 clock distribution chip to supply multiple clock signals to the C6678 multi-core DSP. These include the core clock, DDR3 read/write clock, RapidIO and PCIe transmission clocks, and Gigabit network accelerator clocks. The paper covers the detailed circuit design, clock chip configuration, and initialization of related on-chip components.

1. C6678 and Its Architecture

The C6678 is an 8-core floating-point DSP from Texas Instruments, capable of operating up to 1.25 GHz. Each core provides 40 GMAC fixed-point or 20 G FLOP floating-point performance. A single chip can deliver up to 320 G MAC or 160 G FLOP of computing power. The internal architecture of the C6678 is illustrated in Figure 1.

Multi-channel synchronous clock signal design for multi-core DSP

Each core of the C6678 has 32 KB of program memory, 32 KB of data memory, and 512 KB of Level 2 cache. The chip also features a 4 MB shared SRAM. It includes a DDR3 controller interface that supports external DDR3 memory with a maximum addressable range of 8 GB. The C6678 integrates interfaces such as RapidIO, PCIe, EMIF, SPI, and I2C, which communicate through an on-chip high-speed interconnect bus.

The network-related on-chip modules are shown in the gray area at the lower right of Figure 1. These include two SGMII interfaces, Ethernet switching modules, security accelerators, and packet accelerators, enabling fast data detection, verification, and protocol compliance. These modules help discard invalid data, reducing CPU workload. To speed up data exchange between the network and the CPU, the chip uses an on-chip queue manager for packet buffering and distribution, using packet DMA instead of CPU intervention.

Other on-chip components include PLLs, emulation ports, semaphores, power management, and reset management. The PLLs generate the clock for the CPU and peripherals, while the emulation port allows for software monitoring. Semaphores manage task control in the DSP/BIOS OS, and power management controls voltage and current across the chip. Reset management supports both full and partial boot modes.

2. CDCM6208 and Its Structure

The CDCM62xx series, developed by Texas Instruments, is designed for multi-core processors. The CDCM6208, the second-generation chip, offers significant power savings—reducing consumption from 2–3 W in earlier versions to about 0.5 W. Despite this improvement, its functionality, performance, and size remain unchanged. The CDCM6208 has two optional clock inputs and eight output channels. Four of the outputs support integer division, while the other four allow fractional division, meeting diverse clocking needs for multi-core chips. The chip supports LVPECL, CML, HCSL, and LVDS signal levels, with a maximum frequency of 800 MHz for high-speed interfaces like RapidIO and PCIe. Its clock jitter is less than 265 fs, and it can be controlled via SPI or I2S, making it flexible and user-friendly. TI provides a graphical tool for configuration, allowing users to select and set clock modes easily.

Software generates register values based on desired output frequencies, and these are written to the chip via SPI or I2S to complete the configuration.

The internal structure of the CDCM6208 is shown in Figure 2. After selecting two input clocks, a 14-fold multiplier is used as a reference to drive the on-chip VCO. To improve phase noise, the multiplied signal passes through an on-chip filter, adjustable via an RC circuit. The VCO clock is divided into two prescalers, which can divide by 4, 5, or 6. After the prescaler, the clock enters a later divider, which includes two fractional dividers and one integer divider. The final output is then driven and sent out. As seen in the right side of Figure 2, the integer dividers Y0 and Y1 produce the same frequency, as do Y2 and Y3. The fractional dividers Y4–Y7 offer more flexibility but reduce power consumption. This configuration meets most multi-core processor needs, especially for TI’s C66 and AK2 series DSPs.

Multi-channel synchronous clock signal design for multi-core DSP

3. Hardware Design

Figure 3 shows the clock requirements for the TMS320C6678 application. The main clock consists of a 100 MHz core clock, which the on-chip PLL locks to a frequency ranging from 700 MHz to 1 GHz. The RapidIO and HyperLink interfaces operate at 312.5 MHz, with multipliers of 4, 8, 10, or 16, resulting in frequencies up to 5 GHz. The HyperLink interface multiplies to 40, 80, 100, or 160, reaching speeds up to 50 GHz.

The PCIe interface and PA_SS network accelerator both use a 100 MHz input, which is internally multiplied to meet interface requirements. The DDR3 clock is 66.667 MHz, multiplied by 20 or 25 to reach 1.333 GHz or 1.666 GHz. Each of these clocks has its own independent PLL circuit, with similar setup procedures. The 25 MHz clock in Figure 3 is dedicated to the Gigabit network, provided by a crystal oscillator. The C6678 also provides a clock output signal, defaulting to 1/6 of the core clock (16.667 MHz), which can be used to monitor the chip’s operation status.

Infrared Touch Frame

We are a leading manufacturer of IR touch frames and Infrared Touch Screen overlays in China. Our IR touch screen frames are designed for high precision, easy integration, and wide compatibility across various display sizes. Each touch screen overlay kit uses advanced infrared sensing technology to detect finger, stylus, or gloved touch with high responsiveness and accuracy.
 
Our products include multi touch frame solutions that support up to 10-point or more simultaneous touches, making them ideal for interactive displays, open frame touch screen kiosks, smart whiteboards, and digital signage. The multi touch overlay kit is easy to install, plug-and-play via USB, and compatible with Windows, Android, and Linux systems.
 
Whether you need a compact IR frame for a tabletop display or a large-scale touch screen overlay kit for a conference board, we provide custom sizes, durable materials, and responsive support. Choose our high-quality infrared touch screen overlay solutions for a cost-effective, scalable, and interactive experience.

open frame touch screen,multi touch frame,touch screen overlay kit,multi touch overlay kit

Guangdong ZhiPing Touch Technology Co., Ltd. , https://www.zhipingtouch.com