May 01, 2016

Machine Learning, Big Data, and Finding Alpha in the Noise

According to Darwin’s Origin of Species, it is not the most intellectual of the species that survives; it is not the strongest that survives; but the species that survives is the one that is able best to adapt and adjust to the changing environment in which it finds itself.
~ Leon C. Megginson1

In 2014, Nick Hassabis demonstrated software capable of teaching itself to play classic Atari video games — Pong, Breakout, and Enduro — with no instructions. The software was equipped only with access to the controls and the display, knowledge of the score, and instructions to make the score as high as possible.2 In 15 minutes the software could move from having no understanding of a video game to beating a human expert. According to a 2015 paper in Nature by the DeepMind team: "We demonstrate that a single architecture can successfully learn control policies in a range of different environments with only very minimal prior knowledge, receiving only the pixels and the game score as inputs, and using the same algorithm, network architecture and hyper-parameters on each game, privy only to the inputs a human player would have."

Google CEO Larry Page called the technology of Hassabis’s company, Deep Mind, “one of the most exciting things I’ve seen in a long time,” and Google bought the company one month later.3  It's now part of the Google Brain initiative.

As an adolescent, Hassabis founded a successful video game company. He later earned a degree in computer science. Despite early success in the video game industry, he wanted to better understand human intelligence, and in 2005 he enrolled in a neuroscience PhD program at University College London. Hassabis published a study in 2007 that was recognized by the journal Science as a “Breakthrough of the Year.” He showed that the hippocampus—a part of the brain thought to be concerned only with the past—is also crucial to planning for the future.4

The feats of Hassabis’ Atari-playing software were based on advances on the theoretical work of Geoffrey Hinton. In 2006, Hinton, a University of Toronto computer science professor (and now at Google Brain), developed a more efficient way to teach individual layers of neurons in an artificial neural network. In Hinton’s neural algorithm, the first layer of neurons learns primitive features, like an edge in an image or the smallest unit of speech sound. It does this by identifying combinations of pixels or sound waves that occur more often than
they should by chance. Once that layer accurately recognizes those features, they’re fed to the next layer, which trains itself to recognize more complex features, like a corner or a combination of speech sounds. The process is repeated in each layer until the system can reliably recognize phonemes or objects.5  This approach allows enough flexibility and specificity that it can both recognize a face (based on static features) and identify the expression on the face (based on different features common across faces).

In addition to other innovations, Hassabis added targeted feedback loops to Hinton’s work. Deep Mind’s Atari-playing software replayed its past experiences over and over to make the most accurate predictions for the optimal next move. According to Hassabis, this function was inspired by the rumination on the day’s events performed by the sleeping human brain: “When you go to sleep your hippocampus replays the memory of the day back to your cortex,” all the while extracting and learning from the most relevant and useful patterns.6

DeepMind’s breakthroughs are based on algorithms that can learn not only precise details, but also features that differentiate one problem space from another. Furthermore, the algorithms repeatedly review the intricate relationships between the present and the past to find
shortcuts that will improve future forecasts. A decent technical introduction is in their 2015 Nature paper.  In this way, by first defining the market regime, such algorithms mirror the decision styles of history’s great investors. 

The above is an excerpt from our new book, "Trading on Sentiment" (Wiley, 2016) which Investing.com called "a seminal work."

In summary, deep learning is one of the most revolutionary technologies ever developed.  This is a class of algorithms inspired by how the human brain works, and it enables self-driving cars, YouTube to identify faces and animals in videos, and Siri to understand and process speech in milliseconds. Deep Learning automatically extracts new model features.  Processes that previously took experts prohibitive amounts of time can now be automated and accomplished in a fraction of that.  It can also be used for extremely accurate forecasting and prediction.  Read more from a technical summary here at DataScienceCentral.

Today's newsletter explores deep learning, applications to quantitative investing, and how human investors can still find an edge against the machines. It's a high level quant newsletter today, and the technical details won't be of interest to most readers.  However, it's an important topic because machines are slowly taking over many trading applications, including the hunt for alpha.

Context and Regimes


In quantitative investing, deep learning could be dismissed as a surefire way to overfit on data.  However, as we will describe in more detail below, appropriate set up of deep learning can improve results significantly.  In particular, learning algorithms that first identify the market context in which their strategies are deployed (the regime), are better prepared to learn how markets dynamically adapt to information flow.  Academic research in finance does not yet use deep learning (interdisciplinary research is often slow in coming), but it does support the value of understanding context.  For example, research by Elijah DePalma at Thomson Reuters demonstrates that the performance of common investment strategies differs across market regimes, and these differences may be rooted in the divergent mental states of traders in each context (e.g., optimism in a bull market versus pessimism in a bear market).

Historically, many investors have used the VIX to define market regimes as calm or volatile. As DePalma did in the whitepaper linked to above, sentiment can define market regimes.  Our own data product - the Thomson Reuters MarketPsych Indices (TRMI) - was built to address the problem of dimension reduction in media flow, in part to improve regime detection.  The TRMI quantify and aggregate the information that is directly meaningful and impactful to traders in the form of granular sentiment indexes like "fear" and "joy" as well as macroeconomic indexes like "earningsForecast" and "fundamentalStrength" suggested by a review of the academic literature.   

In the new world of machine-learned strategies, most algorithms use a switching mechanism to change algorithms as regimes shift.  Given that deep learning is based on the neural basis of human decision making, it helps to consider how such human decision making changes depending on the context.  For example, in the midst of market panic, traders think and behave very differently than in the midst of a gradual bull market.  A network that generalizes information like a human mind under stress will behave superiorly during a market panic. However, when markets are quiet, a more complex network architecture can ascertain the nuances of information flow and price behavior. Research supports the use of such regime-dependent approaches in more primitive forms (e.g., switching from value to momentum strategies depending on the VIX level). 

With the recent explosion of such machine-readable and granular data sets, deep learning is better able to show its value.  To support the surge of interest in applying machine learning to vast financial datasets, a new ecosystem - including data such as the TRMI - has arisen.

The Data Ecosystem


As a college student in the mid-1990s my university senior thesis was entitled "Predicting the S&P 500 Index with Artificial Neural Networks". I developed genetically optimized neural networks for the research in the paper, and at the time I was a devotee of machine learning.  I was not alone - in fact it was a fairly common pursuit among hobbyists supported by a journal called IIRC "Journal of Neural Networks in Finance" as well as financial AI magazines, an AAII neural networks interest group, and academic research.  But the artificial intelligence techniques, which then were largely limited to simple back-propagation neural networks and genetic algorithms, were often misapplied.  Overfitting was a particular curse.  As a consequence of the lack of productive applications in finance, the ecosystem supporting financial data and analytic techniques withered in the late 1990s.  However in the past three years it has undergone a tremendous resurgence along with the general enthusiasm around FinTech.

First in the ecosystem are those who produce machine ingestible financial data like governments and news agencies like Thomson Reuters, Dow Jones, and Bloomberg.  

There is also a large business around warehousing data, education, conferences, and the dissemination of analytic techniques (like ThalesiansKDD 2016KDNuggetsQuandl, and QuantStart.  And there is the code - many new styles of analytic software, libraries and packages (e.g. for R, Python, and Orange, among others) as well as shared code repositories in GitHub.  Storage companies and analytics platforms like Amazon cloud, Google cloud, and IBM are providing the technical infrastructure to handle the vast data available.

There is a cluster of companies offering competitions to find value in financial datasets including:  KaggleBattleFinQuantopianQuantConnect, and Numer.ai, and there is significant promise in the idea of crowd-sourcing best techniques.  This recent blog from Quantopian is revealing - sentiment and behavioral data tops the charts of most-discussed data, although most of the results are not groundbreaking (many copy or update academic research).  

Given all of these resources, one might expect that machines are already dominating financial markets.  But in fact there are still many limitations on human-built quant strategies.  For example, there is a lack of freely available high quality data for algorithm developers.  It is very expensive to produce and support data like the Thomson Reuters MarketPsych Indices, and thus is is not widely available to crowd-sourcing website (however, if you're an academic, you may be qualified for a free trial - contact us if so).  Other limitations include the lack of interdisciplinary experience (individuals literate in trading, coding, and behavioral sciences).  This is a real handicap.  Much research is limited by the lack of interdisciplinary knowledge and falling into the same traps that have been well-known for 30 years in other fields (in fact, I see papers in the last few years repeating research that was first done 20+ years ago).  And perhaps the most significant constraint is the human tendency to fall into false beliefs, refusing to challenge weak assumptions, and engaging in magical thinking - these biases weaken the scientific method and often go hand-in-hand with poor data hygiene.  Being a coding genius doesn't release from one human biases (in fact, being an expert often reinforces them). Very likely, it is not only the limitations of our mental capacity, but also the frailty of our human nature that will ultimately allow artificial intelligence to out-compete us in financial markets.

Making Sense of Big Data


In applying machine learning techniques to large financial datasets, there are a few rules of the road that are too often ignored:
1.  Ratio of independent to dependent variables.  Analysts need to have a minimum prediction sample size per indicator used, usually about 30 to 1, but increasing with the greater amount of data.  That is, if one has 4 years of historical data - about 1000 trading days - and is performing a daily prediction, then using more than 33 independent variables in the input will most likely cause overfitting.  In general, the fewer the inputs - the simpler the model - the better.  Thus we have a need for dimension reduction.
2.  Dimension reduction.  By honing indicators on the training set, one can reduce the number of input variables to the most independently predictive (the most orthogonal).  It is essential to not test on too many correlated variables.  This is where deep learning can find advantages in reducing dimensionality.  It was partly dimension reduction that allowed DeepMind's AlphaGo to defeat the world's top ranked Go player in March 2016.
3.  Selecting/designing learning algorithms.  Many practitioners of machine learning have preferred algorithms - their own suite of statistical tools and techniques that they habitually apply.  But in order to best use machine learning, one must spend considerable time contemplating the problem at hand and the optimal dimension reduction and tools for approaching it.

Most deep learning architectures are based on logical representations of human information processing.  In our own data set we've quantified the collective limbic system so its input can be used in such models.  The brain's limbic system influences learning and behavior (optimistic people behave differently than pessimists, for example), so it only makes sense that using limbic inputs in neural systems would improve predictions of human behavior.  One of the greatest challenges in text analysis is sarcasm and irony, and one of the most challenging aspects of trading in markets is misdirection.  

 Machines That Lie

“When the AI finds that the only way to win is to show strength, it will do that,” Mr. Churchill says. “If you want to call that bluffing, then the AI is capable of bluffing, but there’s no machismo behind it."
~ University of Alberta computer scientist David Churchill, in the Guardian

Games where the top humans continue to defeat machines are those with a high level of deception, such as Starcraft and Poker.  According to this WSJ article, unlike games like chess and Go where both players can see the entire board at once, StarCraft players can’t. Players must send out units to explore the map and locate their opponent.

Most winning investment strategies are constructed on asymmetries in information flow, but some more sophisticated ones deploy outright bluff.  "Spoofing" is a term for robots who bluff bids and asks in markets in order to move prices. This is similar to the bluffing performed in StarCraft.  Spoofing robots are (maybe) superior to humans in terms of profitability and definitely are superior in speed, but given that spoofing is forbidden, there is little public data on spoofing success rates.  However, in one famous case, Navinder Sarao traded 62,077 E-mini S&P contracts with a notional value of $3.5 billion on the day of the flash crash (May 6, 2010) and made "approximately $879,018 in net profits" that day.  At one point that day his fake sell orders equaled the entire buy side of the order book.  Did his bluffing cause the flash crash?  No, he actually pulled the plug an hour before the flash crash, but it may have contributed to some of the fear.

Human traders will still have an advantage over machines when data is limited and relationships count (e.g., small cap stocks) or where machines cannot access information in machine readable format (e.g., in some commodity trading, in suppliers activities, etc...).  In the past, as a coach, I've worked with many traders who parsed the words of Ben Bernanke and Janet Yellen to understand the subtly meaning of their words - machines may not be able to adapt to their nuances as well as human traders.  I've also worked with other traders who had personal relationships with market makers in illiquid asssts of various types - they also will have a persistent advantage in markets regardless of the presence of machines.  

However, prop traders trading on the news or using standard data feeds without any unique insights - they are unlikely to survive against the machines in this new world.

Housekeeping and Closing

Man’s mind, once stretched by a new idea, never regains its original dimensions.
~ Oliver Wendell Holmes Sr.

While DeepMind used a single network architecture to solve Atari games, the architecture changed for the AlphaGo program.  Similarly, in order to predict the financial markets, novel architectures are needed.  Just as DeepMind relies upon an understanding of human information processing to defeat humans at games such as Chess and Go, beating the stock market will require a detailed understanding of human trader behavior.

In the big picture, DeepMind’s learning algorithms suggest that in order to understand patterns in markets, investors ought to 1) independently consider the big picture—the context—before diving into the details and 2) continually learn from markets what works and what doesn't in the current climate. 

As George Soros’s theory of reflexivity implies, trading behavior is not stable - the media change investor perceptions and sentiments, and these in turn alter investment and economic behavior, moving prices in a feedback loop. It follows that any superior financial predictive algorithms will require the use of media sentiment data.  From our (admittedly biased) perspective, the best quantitative investment strategies of the next decade are likely to be those that integrate behavioral insights, machine intelligence, and big data.  

Awed by the Machines,
Richard L Peterson and the MarketPsych Team

References

1.     Leon C. Megginson, “Lessons from Europe for American Business,” Southwestern Social Science Quarterly 44(1) (1963), pp. 3–13, at p. 4.
2.     Antonio Regalado, “Is Google Cornering the Market on Deep Learning?” MIT Technology Review (January 29, 2014).
3. http://www.ted.com/talks/larry_page_where_s_google_going_next/transcript?language=en.
4.     Tom Simonite, “Google’s Intelligence Designer,” MIT Technology Review (December 2, 2014). Retrieved May 20, 2015 from: http://www.technologyreview.com/news/532876/googles-intelligence-designer/
5.     Robert D. Hof, “10 Breakthrough Technologies 2013: Deep Learning,” MIT Technology Review (April 23, 2013). 
6.     Simonite.