During my third year of grad school, my advisor asked if I was interested in contributing to a special edition of the journal Entropy on the topic of Integrated Information Theory (IIT). My understanding at the time was that IIT was gaining traction as an interesting theory of complexity with a rich mathematical framework, so I welcomed the opportunity. Not only that, but the complexity measure "Phi" at the heart of the theory was a candidate measure of consciousness, meaning that the information-theoretic thing that IIT quantified was supposedly one and the same with subjective experience.
I was immediately skeptical that any scalar mathematical measure could quantify consciousness, as I'm sure anyone not well versed in IIT can empathize with, but it seemed like a lot of people, with a lot more experience than me, were totally on board with Phi as the answer to the hard problem of consciousness (Max Tegmark, for example). Thus, I put my initial doubts aside and decided I had to dig through the details of the theory before I could assess its validity.
Early on, I was captivated by the spirit of IIT. My biggest concern was how one can go from math to consciousness, and IIT speaks directly to this. The basic idea is that if you want to measure consciousness, you have to start with a phenomenological understanding of "what it is like" to be conscious and, from this, "derive" the properties of physical systems that can instantiate this phenomenology. For example, our left and right visual field are experienced as a single "unified whole" which means information from both eyes must be shared at some point to account for this experience of a unified visual field. Based strictly on this idea, one can posit that physical systems that lack the ability to exchange information necessarily lack consciousness (or take part in two isolated conscious experiences) as the ability to physically "integrate information" seems necessary in order to generate the phenomenal experience of a unified whole. And, in this way, the mathematical formalism of IIT is built.
I was immediately skeptical that any scalar mathematical measure could quantify consciousness, as I'm sure anyone not well versed in IIT can empathize with, but it seemed like a lot of people, with a lot more experience than me, were totally on board with Phi as the answer to the hard problem of consciousness (Max Tegmark, for example). Thus, I put my initial doubts aside and decided I had to dig through the details of the theory before I could assess its validity.
Early on, I was captivated by the spirit of IIT. My biggest concern was how one can go from math to consciousness, and IIT speaks directly to this. The basic idea is that if you want to measure consciousness, you have to start with a phenomenological understanding of "what it is like" to be conscious and, from this, "derive" the properties of physical systems that can instantiate this phenomenology. For example, our left and right visual field are experienced as a single "unified whole" which means information from both eyes must be shared at some point to account for this experience of a unified visual field. Based strictly on this idea, one can posit that physical systems that lack the ability to exchange information necessarily lack consciousness (or take part in two isolated conscious experiences) as the ability to physically "integrate information" seems necessary in order to generate the phenomenal experience of a unified whole. And, in this way, the mathematical formalism of IIT is built.
Mathematical Problems with IIT
Unfortunately, my honeymoon phase with IIT was short-lived. Qualitatively, the theory is great, and there is even an extensive vocabulary invented to go hand-in-hand with the mathematics of the theory ("qualia spaces", "autonomy", "agency", etc.), but the actual math behind these words is horrendous. First, the process of calculating "Phi" (IIT's measure of consciousness) is a nested optimization inside of a nested optimization inside of yet another nested optimization. At each step, one applies the axioms of the theory in order to calculate a local phi value ("little phi"), then one compares these local phi values to each other in order to get a mesoscopic phi value which is then compared to the other mesoscopic values and so on and so forth. Keeping track of all these phi values is extremely tedious to do by hand, which is why I doubt many people have ever actually calculated Phi ("big phi").
In fact, I have a better reason to doubt this, as it was a little known fact at the time that if you do try to calculate Phi for even the simplest possible system (e.g. an AND and an OR gate connected to each other) you will find that it's impossible. The reason for this is that the axioms of IIT do not address what to do in the event that there are degenerate local phi values. In particular, the exclusion axiom states that as part of the optimization process, you must choose the lowest phi value as the "core cause" (another bit of vocab) for a given "mechanism". But, if there are two different core causes with the same phi value (an extremely common occurrence) then IIT does not specify which core cause to choose and your final results are extremely sensitive to this choice.
In fact, I have a better reason to doubt this, as it was a little known fact at the time that if you do try to calculate Phi for even the simplest possible system (e.g. an AND and an OR gate connected to each other) you will find that it's impossible. The reason for this is that the axioms of IIT do not address what to do in the event that there are degenerate local phi values. In particular, the exclusion axiom states that as part of the optimization process, you must choose the lowest phi value as the "core cause" (another bit of vocab) for a given "mechanism". But, if there are two different core causes with the same phi value (an extremely common occurrence) then IIT does not specify which core cause to choose and your final results are extremely sensitive to this choice.
Consequently, in calculating the Phi value for a simple AND/OR gate circuit (loosely analogous to a brain with only two neurons), I found that there were 33 different Phi (big Phi) values associated with different choices for the core cause/effect, and these values span the entire range of possible Phi values for the system. Thus, Phi is completely undefined. I dug into the PyPhi package, which is what everyone uses to calculate Phi in practice, and found that, sure enough, it just randomly grabs the first degenerate Phi value, rather than comparing them in any sort of principled fashion (Figure 1).
At this point, I started to have serious doubts about the validity of IIT. Here we are, twenty years into a proposed theory of consciousness based on a mathematical measure called Phi that isn't even defined! I was blown away by the fact that IIT was so popular, yet no one talked about the fact Phi isn't unique and couldn't actually be calculated. I started to believe that perhaps all the rhetoric surrounding Phi was responsible for its popularity and that the actual physical underpinnings were nonexistent. In other words, I started to think this whole theory might have nothing to do with reality. Given that this is the most popular theory of consciousness in contemporary neuroscience and has been growing exponentially over the last two decades (Figure 2), this was not a heartening thought for a graduate student with no first-author papers to have, especially under a tight deadline.
At this point, I started to have serious doubts about the validity of IIT. Here we are, twenty years into a proposed theory of consciousness based on a mathematical measure called Phi that isn't even defined! I was blown away by the fact that IIT was so popular, yet no one talked about the fact Phi isn't unique and couldn't actually be calculated. I started to believe that perhaps all the rhetoric surrounding Phi was responsible for its popularity and that the actual physical underpinnings were nonexistent. In other words, I started to think this whole theory might have nothing to do with reality. Given that this is the most popular theory of consciousness in contemporary neuroscience and has been growing exponentially over the last two decades (Figure 2), this was not a heartening thought for a graduate student with no first-author papers to have, especially under a tight deadline.
I was now six months into learning IIT and felt that I needed to publish something soon in order to justify the amount of time I'd spent learning the mathematical formalism of the theory. Unfortunately, the only "result" I had was that Phi could not be calculated at all, which was not an easy result to submit to a special edition devoted entirely to IIT. Not only that, but there was at least one relatively obscure case in the literature where this problem was mentioned and therefore the result was not even new. Had I been more skilled in the art of writing papers, I could probably have pulled this paper off but the details were extremely technical and I wasn't entirely sure what the take-home message was. In addition, I didn't like the idea of a strictly deconstructive contribution with no real remedy to the problem, though I have since changed my mind on this matter. Regardless, I went about looking for a better way to point out the mathematical problems with Phi, hoping for a simple remedy.
Epistemological Problems with IIT
The simple remedy did not exist, as all I could manage to prove were increasingly better reasons why IIT must be fatally flawed. In particular, IIT assumes that physical feedback is a necessary condition for consciousness, such that any circuit/brain that lacks feedback necessarily lacks consciousness. Yet, there is a theorem in automata theory that states anything that can be done with feedback can be done without feedback simply by "unfolding" the feedback connections present in a circuit (the Krohn-Rhodes theorem). In other words, from an engineering perspective, there is absolutely nothing special about the presence of feedback - you can always get rid of the feedback in favor of strictly feedforward logical connections. In light of this, I began to wonder what happens to the Phi value of a system under Krohn-Rhodes decomposition. In theory, it should be possible to construct two different circuits/brains that execute the exact same input-output behavior (philosophical zombies) with and without the presence of feedback. Since feedback is a necessary condition for consciousness in IIT, this implies that one of these circuits would have to be conscious while the other is not.
I went about constructing simple examples to explore this idea, proving that it is always possible to fix the (outward) function of a system while changing its (internal) Phi value. Thus, Phi is completely decoupled from function. It had now been another six months since starting this project, and I thought this would have to be good enough for a paper. Having shown that Phi has nothing to do with input-output behavior, it seemed to me that whatever Phi claims to be measuring, it can't be justified in terms of subjective experience, as one must simply assume that a difference in subjective experience exists in absence of additional functional consequences - an assumption that can't be tested.
I started writing up these results but was plagued by the fact that IIT explicitly addresses the existence of functionally identical systems with different Phi values as part of the 2014 formulation of IIT 3.0. In other words, proponents of IIT were well aware of the fact that philosophical zombies could exist and were somehow completely OK with the idea that what justified the difference in subjective experience (measured by Phi) was Phi itself. Thus, it seemed my contribution was perhaps an interesting way of constructing these systems but something proponents of IIT would easily brush off as inconsequential, as they openly admit that the theory embraces such systems. The deeper issue at hand was one of epistemic justification. How is it that proponents of IIT could justify that what Phi is measuring is in fact consciousness? In the absence of functional differences, how can one say that a zombie system lacks phenomenal properties such as a "unified experience" without simply assuming it to be so? It seemed that the rhetoric being used to justify Phi as a measure of consciousness was grounded entirely in input-output behavior but, if the input-output behavior was fixed, Phi became its own justification scheme in that it was used as both a means and an end.
There was no place this problem was more readily apparent than experiments designed to falsify IIT in a laboratory setting. According to its proponents, if Phi was shown to increase in response to behavioral states we commonly associate with lower levels of subjective experience (e.g. sleep) then the theory was falsified. Yet, the logical validity of this entire argument is based on the premise that outward behavior is an accurate reflection of internal subjective experience. In other words, for this experiment to validate/falsify IIT one must believe the assumption that when a system appears to be asleep it objectively has a lower subjective experience than when a system is awake - the same assumption that proponents of IIT reject in defense of philosophical zombies! Thus, it seems proponents of IIT wanted to have their cake and eat it too. Experimental falsification was one of the reasons for IIT's meteoric rise to fame and, indeed, multimillion dollar efforts are still underway to test IIT in a laboratory setting. Yet, the Krohn-Rhodes theorem guarantees that whatever the results of these experiments are, it is possible that the opposite results exist, as what is being measured internally has nothing to do with the input-output behavior of the system. Thus, there is no reason to experimentally test IIT, as it is already falsified a priori...
I went about constructing simple examples to explore this idea, proving that it is always possible to fix the (outward) function of a system while changing its (internal) Phi value. Thus, Phi is completely decoupled from function. It had now been another six months since starting this project, and I thought this would have to be good enough for a paper. Having shown that Phi has nothing to do with input-output behavior, it seemed to me that whatever Phi claims to be measuring, it can't be justified in terms of subjective experience, as one must simply assume that a difference in subjective experience exists in absence of additional functional consequences - an assumption that can't be tested.
I started writing up these results but was plagued by the fact that IIT explicitly addresses the existence of functionally identical systems with different Phi values as part of the 2014 formulation of IIT 3.0. In other words, proponents of IIT were well aware of the fact that philosophical zombies could exist and were somehow completely OK with the idea that what justified the difference in subjective experience (measured by Phi) was Phi itself. Thus, it seemed my contribution was perhaps an interesting way of constructing these systems but something proponents of IIT would easily brush off as inconsequential, as they openly admit that the theory embraces such systems. The deeper issue at hand was one of epistemic justification. How is it that proponents of IIT could justify that what Phi is measuring is in fact consciousness? In the absence of functional differences, how can one say that a zombie system lacks phenomenal properties such as a "unified experience" without simply assuming it to be so? It seemed that the rhetoric being used to justify Phi as a measure of consciousness was grounded entirely in input-output behavior but, if the input-output behavior was fixed, Phi became its own justification scheme in that it was used as both a means and an end.
There was no place this problem was more readily apparent than experiments designed to falsify IIT in a laboratory setting. According to its proponents, if Phi was shown to increase in response to behavioral states we commonly associate with lower levels of subjective experience (e.g. sleep) then the theory was falsified. Yet, the logical validity of this entire argument is based on the premise that outward behavior is an accurate reflection of internal subjective experience. In other words, for this experiment to validate/falsify IIT one must believe the assumption that when a system appears to be asleep it objectively has a lower subjective experience than when a system is awake - the same assumption that proponents of IIT reject in defense of philosophical zombies! Thus, it seems proponents of IIT wanted to have their cake and eat it too. Experimental falsification was one of the reasons for IIT's meteoric rise to fame and, indeed, multimillion dollar efforts are still underway to test IIT in a laboratory setting. Yet, the Krohn-Rhodes theorem guarantees that whatever the results of these experiments are, it is possible that the opposite results exist, as what is being measured internally has nothing to do with the input-output behavior of the system. Thus, there is no reason to experimentally test IIT, as it is already falsified a priori...
Resolution
Fortunately, at the same time I was submitting my paper on Krohn-Rhodes decomposition, the much more popular "unfolding argument" was published by Doerig et al., which essentially articulated the same points I was trying to make but with fewer technical details and a much cleaner narrative. In short, the unfolding argument is that feed-forward neural networks (NNs) can realize the same input-output behavior as recurrent neural networks, and therefore one can fix the input-output behavior of a system while changing its Phi value (the primary difference between their work and mine being the use of NNs instead of deterministic finite-state automata). More importantly, Doerig et al. clearly articulated that this implies one of two possibilities: either the theory is falsified due to the fact you can get arbitrary Phi values for fixed behavior, or the theory is inherently unfalsifiable (if one insists that Phi can be used to justify the difference in subjective experience under fixed input-output conditions). It was this latter implication that I had really struggled to pin down. I was aware of the fact that Phi was changing without clear justification in terms of behavior, but I didn't clearly recognize that this meant the theory is metaphysical if one continues to insist that Phi is the true solution. In other words, I was trying to figure out how to convince believers that Phi is incorrect when in reality the best I could I ever do is convince them it is unscientific.
Generalizations of the unfolding argument quickly followed, in which the role of inference was clearly defined [Kleiner and Hoel, 2020]. Crucially, what is needed to test a theory of consciousness are results from an independent inference procedure, such as the inference that sleep is indicative of lower levels of subjective experience. This inference must be made independent of any theoretical framework and used as the benchmark to which predictions from a given theory are compared. If the prediction from the theory doesn't match the results from the inference procedure (e.g. the theory predicts high consciousness when asleep and low consciousness when awake) then the theory is falsified. Furthermore, if one assumes that independent inference procedures are based on input-output behavior such as sleep (an assumption that seems unavoidable) then the ability to vary the prediction from a theory of consciousness under fixed input-output automatically implies the theory is falsified as at least one of the predictions is logically guaranteed to disagree with the results from the inference procedure.
With these new results (formalization of unfolding and falsification) in hand, I was able to ground everything I had done in terms of these increasingly familiar formalisms. For example, I could now prove that the Krohn-Rhodes theorem falsifies any theory of consciousness that assumes feedback as a necessary condition, such as IIT. Thus, I was finally able to translate the intuition that motivated my original line of arguments into concrete mathematical proofs - solidifying to myself that IIT is indeed ill-fated.
Generalizations of the unfolding argument quickly followed, in which the role of inference was clearly defined [Kleiner and Hoel, 2020]. Crucially, what is needed to test a theory of consciousness are results from an independent inference procedure, such as the inference that sleep is indicative of lower levels of subjective experience. This inference must be made independent of any theoretical framework and used as the benchmark to which predictions from a given theory are compared. If the prediction from the theory doesn't match the results from the inference procedure (e.g. the theory predicts high consciousness when asleep and low consciousness when awake) then the theory is falsified. Furthermore, if one assumes that independent inference procedures are based on input-output behavior such as sleep (an assumption that seems unavoidable) then the ability to vary the prediction from a theory of consciousness under fixed input-output automatically implies the theory is falsified as at least one of the predictions is logically guaranteed to disagree with the results from the inference procedure.
With these new results (formalization of unfolding and falsification) in hand, I was able to ground everything I had done in terms of these increasingly familiar formalisms. For example, I could now prove that the Krohn-Rhodes theorem falsifies any theory of consciousness that assumes feedback as a necessary condition, such as IIT. Thus, I was finally able to translate the intuition that motivated my original line of arguments into concrete mathematical proofs - solidifying to myself that IIT is indeed ill-fated.
On the Future of IIT
Going forward, it seems the need for an independent inference procedure based on input-output behavior is a major epistemological concern that theories of consciousness must contend with. If behavior is the ultimate arbiter, then theories of consciousness must be invariant with respect to fixed input-output behavior if they are to avoid a priori falsification. In other words, falsifiability boils down to what we can infer from behavior, and what we can infer is pretty much limited to our "folk psychology" understanding of what behaviors are and are not associated with consciousness. For this reason, I do not think the future of consciousness is bright, as rich mathematical theories must be given up in favor of behaviorially falsifiable theories - a throwback to the Turing test.
As a case study, however, IIT remains extremely interesting to me. How is it that the theory is so popular given that it is both quantitatively and qualitatively so poorly defined? Even in light of the unfolding argument and mathematical proofs of falsification, I don't see proponents of IIT giving it up any time soon. I have met many of them in person, and for whatever reason IIT seems to be a disproportionately large part of their identity - certainly much more so than any other theory I've come across. Of course, this is not unanimous, and I have found plenty of people on both sides of the debate willing to discuss IIT with an open mind, but there is certainly a core of proponents of IIT that talk of constellations of concepts in qualia space with an air of superiority that makes you feel like they must know something you don't in order to justify such a seemingly strong belief in their theory.
But, the math doesn't lie and I'm now convinced that this is some sort of social or psychological phenomenon in which the culture of IIT attracts believers for reasons that are anything but scientific - perhaps by promising an answer to one of the most difficult existential questions. While I don't personally like or approve of this approach to science, I can't help but recognize that one of the reasons we have come so far in formalizing the epistemic problems surrounding consciousness so quickly is due to the fact that seemingly solid arguments against IIT did nothing to detract from its fan base. Had I been the inventor of IIT, the existence of philosophical zombies would have been a deal-breaker for me, as there are strong logical indictments against any theory that admits them [e.g. Harnad 1995] and IIT is clearly not immune to these arguments. But IIT refused to believe these indictments and, in doing so, it forced stronger and stronger logical arguments out of those who dissent. Thus, IIT's inability to easily be thrown away insisted that the best possible arguments be brought forth against it which, in their own right, apply much more generally than IIT.
In light of this, I am curious to see what happens to IIT in the next five years. Ideally, I would hope to see an abrupt decline in the use of Phi and incline in the emphasis of problems associated with the epistemology of consciousness, signaling the acceptance of the unfolding argument. However, I am not naive about the institutional inertia behind this theory. Not only are millions of dollars being spent to test it in a lab, but dozens of graduate students, postdocs, and professors have devoted significant time to pushing this theory forward in some form or another. To accept the notion that one must go back to step one (framing the problem) after so many years of hard work is a difficult thing to do. This, in combination with the historical tendency for proponents of IIT to pivot rather than truly address contradictions in the theory, makes me think it is equally likely that Phi lives on under a different mathematical guise with equally fatal problems buried under a mountain of confusing jargon. If this is the case, I probably will not give IIT the benefit of the doubt again.
As a case study, however, IIT remains extremely interesting to me. How is it that the theory is so popular given that it is both quantitatively and qualitatively so poorly defined? Even in light of the unfolding argument and mathematical proofs of falsification, I don't see proponents of IIT giving it up any time soon. I have met many of them in person, and for whatever reason IIT seems to be a disproportionately large part of their identity - certainly much more so than any other theory I've come across. Of course, this is not unanimous, and I have found plenty of people on both sides of the debate willing to discuss IIT with an open mind, but there is certainly a core of proponents of IIT that talk of constellations of concepts in qualia space with an air of superiority that makes you feel like they must know something you don't in order to justify such a seemingly strong belief in their theory.
But, the math doesn't lie and I'm now convinced that this is some sort of social or psychological phenomenon in which the culture of IIT attracts believers for reasons that are anything but scientific - perhaps by promising an answer to one of the most difficult existential questions. While I don't personally like or approve of this approach to science, I can't help but recognize that one of the reasons we have come so far in formalizing the epistemic problems surrounding consciousness so quickly is due to the fact that seemingly solid arguments against IIT did nothing to detract from its fan base. Had I been the inventor of IIT, the existence of philosophical zombies would have been a deal-breaker for me, as there are strong logical indictments against any theory that admits them [e.g. Harnad 1995] and IIT is clearly not immune to these arguments. But IIT refused to believe these indictments and, in doing so, it forced stronger and stronger logical arguments out of those who dissent. Thus, IIT's inability to easily be thrown away insisted that the best possible arguments be brought forth against it which, in their own right, apply much more generally than IIT.
In light of this, I am curious to see what happens to IIT in the next five years. Ideally, I would hope to see an abrupt decline in the use of Phi and incline in the emphasis of problems associated with the epistemology of consciousness, signaling the acceptance of the unfolding argument. However, I am not naive about the institutional inertia behind this theory. Not only are millions of dollars being spent to test it in a lab, but dozens of graduate students, postdocs, and professors have devoted significant time to pushing this theory forward in some form or another. To accept the notion that one must go back to step one (framing the problem) after so many years of hard work is a difficult thing to do. This, in combination with the historical tendency for proponents of IIT to pivot rather than truly address contradictions in the theory, makes me think it is equally likely that Phi lives on under a different mathematical guise with equally fatal problems buried under a mountain of confusing jargon. If this is the case, I probably will not give IIT the benefit of the doubt again.