How to jumpstart ML in production (from a friend)

These days, I could not find anything that does not have ML spices. To certain extent the euphoria about ML starting to sound like Data to BigData. Many big companies like MSFT/Google/FB have long been taking the strong stance that Intelligent will power many scenarios (Google Home/Alexa/Bots/Cortana/Bing Prediction/etc). So, yeah it is great to hear Traveloka is also investing in ML early.

If there is a tech startup I want to build it is probably around building world class Infrastructure for ML -- transitioning from where I am to do this is a bit challenging and too 'costly' while I am still pretty much putting my ears and eyes for any opportunity.

Having said that, many problems do not require ML service beyond distributed ML. Hence, if you use CAFE/Theano/TensorFlow to build single machine service, generally in my opinion it is just like building any other (large-scale) service -- e.g. similar to most Search Engine where you have to have a pipeline building the index (and doing serious processing, building ranking model etc) and you need the other pipeline to do the serving (i.e. loading the model, index, executing the ranking which often uses some machine learning model). The difficult part of running ML is probably in how you build the toolsets surrounding your service -- how do you debug your model? how do you experiment with/explore the hyper parameters? how do you enable 'layman' to be able to run ML experiment - as you don't have as many DataScientists? Are you doing Image/Voice(dense) vs Text(sparse) - GPU/FPGA vs CPU-based? Also many cloud providers AWS/GCP/Azure now have pretty good ML service that will cover many basic scenarios, have you considered them?

There are some general slides that I am sure you can find through Googling like this one:
http://www.slideshare.net/dato-inc/machine-learning-in-production

https://code.facebook.com/posts/1072626246134461/introducing-fblearner-flow-facebook-s-ai-backbone/

http://deeplearning.net/tag/google-brain/

https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf

Company like Baidu, Alibaba are also trying to replicate ML service like Project Adam (my project in MSR), Google Brain, FBLearners. If you are aiming to solve some more complex problem in ML -- then I guess this is the kind of directions that you would want to consider.