Using Apache Thrift to Integrate Haskell and Python

Posted on September 11, 2011

For those following along, I’ve been working on building a rate of return (‘RoR’) calculation system using Haskell (see parts 1, 2 & 3). The system operates offline; that is, the RoR calculations are done monthly as part of a batch process and the results stored in a database for reporting via an existing application.

This system has worked well, as the web-based application just needs to report the RoR results, not show the steps in calculating them. This has been sufficient for the purpose of reconciling the time-weighted RoR numbers from our system with the numbers produced by our wealth management platform.

However, we’ve found that when we calculate returns using the dollar-weighted (or IRR) calculation method, it’s helpful to also report the component cashflows, and not just the final results.

After looking at the various options for integrating Haskell and Python, I chose to use the Thrift framework; I created a IRR service in Haskell, which is called by the code. The software stack looks like this:

I’m not going to go step by step through the Thrift framework in this post, instead I’m going to just provide some snippets of code that got the system working for me.



The first step in working with Thrift (after installing, etc.), is creating the thrift file. For our application we are only interested in calling a single function from the Haskell RoR code:

     -- |Primary function to get returns for [account] 
    portfolioIRR :: [AccountId]  -- ^ list of accounts         
               -> ISODateInt     -- ^ endDate for return period         
               -> IO (Maybe ((Double, RoR, Period), [(ISODateInt, Cashflow, Double, Cashflow)])) 

Given an list of accounts and an end date, the portfolioIRR function returns a tuple consisting of another tuple with the sum or the IRR cashflows, the RoR and a Period data type. The snd in the tuple is a list of cashflows, consisting of tuples of dates, cashflows, cashflow-days and cf-day-amounts.

Our thrift file needs to define this function and the service to call the function:

struct Cashflow {
  1: i32 isodate,
  2: double cashflow,
  3: double days,
  4: double rorcashflow

struct IRR {
  1: double rate = 0,
  2: double cashflowsum,
  3: list cashflows

exception NoIRRresults {
  1: string why

service IRRCalculator {
   IRR portfolioIRR(1:list accounts, 2:i32 enddate)

I used the thrift file to generate both Haskell and Python client/server libraries.

The Haskell Server ——————

    module Main where

    import Data.List
    import IO
    import Network
    import Data.Maybe
    import Data.Int
    import Control.Exception (throw)
    import qualified Data.Map as M

    -- ror libraries
    import Types
    import RoR

    -- Thrift libraries
    import Thrift
    import Thrift.Transport.Handle
    import Thrift.Protocol
    import Thrift.Protocol.Binary
    import Thrift.Server

    -- Generated Thrift modules
    import Ror_Types
    import qualified IRRCalculator
    import IRRCalculator_Iface

    port :: PortNumber
    port = 8008

    -- convert portfolioIRR results into correct types
    makeCashflow :: (ISODateInt, Types.Cashflow, Double, Types.Cashflow) -> Ror_Types.Cashflow
    makeCashflow (isodate,cashflow,days,rorcashflow) = Ror_Types.Cashflow {
      f_Cashflow_isodate = Just (fromIntegral isodate :: Int32),
      f_Cashflow_cashflow = Just cashflow,
      f_Cashflow_days = Just days,
      f_Cashflow_rorcashflow = Just rorcashflow

    makeIrr :: (Double, RoR, Period) -> [Ror_Types.Cashflow] -> Ror_Types.IRR
    makeIrr (cashflowsum,rate,period) cashflows = Ror_Types.IRR {
      f_IRR_rate = Just rate,
      f_IRR_cashflowsum = Just cashflowsum,
      f_IRR_cashflows = Just cashflows

    data IRRCalculatorHandler = IRRCalculatorHandler

    newIRRCalculatorHandler = do
        --log <- newMVar mempty
        return $ IRRCalculatorHandler --log

    instance IRRCalculator_Iface IRRCalculatorHandler where

        portfolioIRR _ accounts enddate = do
            portirr <-  RoR.portfolioIRR accounts' enddate'
            case portirr of
                Nothing -> do
                    print "no IRR results found"
                    -- can't get this to work:  throw $ NoIRRresults { f_NoIRRresults_why = Just "No irr results"}
                    -- ugly... return empty results
                    let irr' = makeIrr (0,0,Period {endDate=0,netCashflow=0,endPositions= M.fromList []}) []
                    return irr'
                Just x  -> do
                    print "IRR results found"
                    let (irrInfo,cfs') = fromJust portirr
                    let cfs = map makeCashflow cfs'
                    let irr' = makeIrr irrInfo cfs
                    return irr'
            where accounts' = map (\x -> fromIntegral x) $ fromJust accounts
                  enddate'  = fromIntegral $ fromJust enddate

    main :: IO ()
    main = do
        handler <- newIRRCalculatorHandler
        print "Starting the server..."
        runBasicServer handler IRRCalculator.process port
        print "done."

The Python Client

The application communicates with the Haskell server using the following:

    import sys
    import ror.IRRCalculator
    from ror.ttypes import *

    from thrift import Thrift
    from thrift.protocol import TBinaryProtocol
    from thrift.transport import TTransport
    from thrift.transport import TSocket

    transport = TSocket.TSocket('localhost', 8008)
    transport = TTransport.TBufferedTransport(transport)
    protocol = TBinaryProtocol.TBinaryProtocol(transport)

    client = ror.IRRCalculator.Client(protocol)

    irr = client.portfolioIRR(accounts,int(form.d.end_date))
    if (irr.rate == 0) and (irr.cashflowsum == 0) and (irr.cashflows == []):
        raise ObjectNotFoundError("could not find account or portfolio IRR data")


The irr object can be used like any Python object (in this case it was used in a template to show the IRR and component casfhlows). I was not able to get Haskell exceptions working with Thrift; the thrift service would just hang, so I resorted to sending an empty data set as an indication of an error.