Hardware software co-synthesis problem is related to finding an architecture, subject to certain constraints, for a given set of tasks that are related through data dependencies. The architecture consists of a set of heterogeneous processing elements and a communication structure between these processing elements. In this thesis, a new algorithm for co-synthesis is presented that targets distributed memory architectures. The algorithm consists of four distinct phases namely, processing element selection, pipelined task allocation, scheduling and best topology selection. Selected processing elements are finally mapped to a regular distributed memory architecture comprising of mesh, hypercube or quad-tree topology. The co-synthesis method is demonstrated by applying it to MPEG encoder application and various size large random graphs.