Is there a dark side to data?

As much as we laud the potential that Big Data and Data Science can give us, there is indeed a caveat to using it. In an interview with Fortune, Cathy O’Neil, the author of Weapons of Math Destruction tells us an important detail about data algorithms: the algorithm is only as good as the data going into it. So, if the data that one is putting into an algorithm is biased, the output will inevitably have that bias.

 

While in the business world, as well as in science and medicine, data’s benefits are agreed-upon, the consequences might be harmful in other areas of society: explicitly, in the social and political order. In the words of Harvard Law School professor Lawrence Lessig at the Cloudfare Conference in September 2017, “[t]echnology can be “the best and worst of times at the same time[.]” There is no better event to study the implications of Data Science in politics than the 2016 presidential election in the United States.

 

The campaign that broke American politics

Does that title seem absurd? It may not be. In a column Chuck Todd and Carrie Dann published on NBC News’s website in March 2017, the two opened declaring: “[b]ig data revolutionized the way American politicians win elections. In the process, it broke American politics.” How, exactly?

 

Instead of using the traditional strategy or targeting the group of voters that find themselves in the middle, big data and the technology that comes along with it allows campaigns to “…craft very different versions of a candidate’s views for different groups. So one candidate could push himself as a populist to one sub-group of voters, a fiscal conservative to another, and perhaps a right-wing reactionary to yet another.” Lessig says that this new strategy is feasible “…because there is no longer a shared understanding of the world as there was 30 or 40 years ago when most Americans got their news from a handful of TV networks. The factionalization of news sources has also factionalized the electorate.” Todd and Dann agree because they concluded (albeit a little too simplistically for some), “…data could be used to activate every possible base voter and build a partisan firewall.”

 

Indeed, Eitan Hersh, Professor at Yale University studied how campaigns capitalize on data and their ability to micro-target because of it in his book Hacking the Electorate. His conclusion is this:

The hack consists in asking — and in discovering — what data can be used to manipulate or mobilize voters. If you have enough data, you can predict how people will behave, how they will vote. Campaigns have developed sophisticated ways of doing this, and the growing availability of data is accelerating that process.

In the case of politics, Hersh makes the distinction that he uses the term big data to refer to the entire electorate instead of samples, which used to be the source of predicting voting patterns in past campaigns. With even more public data at campaigns’ disposal, they’re able to tap into even more information than before.

 

Both campaigns incorporated Data Science into their efforts, and the use of that data has had (and is currently having) severe consequences for the way the US political apparatus runs and in the way the country’s perception has changed (for worse) around the world.

 

Thus, both of these campaigns prove Hersh’s thesis that “[b]ig data makes it easy for candidates to dismiss their opponents. They now know, with greater and greater precision, how people voted and how they’re likely to vote in the future, and their campaigns reflect that.”

 

A Tale of Two Data-Driven Campaigns

Clinton

Good Rebels was right when it said most Americans wouldn’t recognize the name Elan Kriegel in their whitepaper on the data scientist. Despite that, millions of them (including the author of this post), were within sight of him and his team as Kriegel was the Director of Analytics for Hillary Clinton’s presidential campaign. In fact, campaign manager Robbie Mook called Kriegel the campaign’s “’invisible guiding hand’” in an interview with Politico less than two months before the general election.

 

Kriegel’s team, comprising over 60 analysts and mathematicians, had an algorithm “…that determined…where almost every dollar of Clinton’s more than $60 million in television ads was spent during the primary.” Not only that; the data extracted from the algorithm and its predicted models determined every decision made during the campaign: from the precise selection of which voters to target through various forms of communication, whether it be emails, phone-banking, or canvassing, and the Facebook ads directed to subsets of the population through microtargeting.

 

While the Clinton campaign put money into its data operation, Clinton subsequently bashed the lack of data infrastructure that the Democratic National Committee gave her following her winning the nomination. Calling it “nonexistent,” she said that the Trump campaign had an advantage in the foundation he inherited from the Republican National Committee after its losing 2012 campaign.

 

Even with this incredible algorithm, what could have led to Clinton’s failure? To Hersh, it all relates back to how campaigns now place a laser-sharp focus on the voters who will vote for them and not consider the rest of the voter pool: “I think this is a big reason why Hillary Clinton lost. They banked on a campaign focused on mobilizing voters who voted for Obama and did not engage in the kind of persuasion that might have shifted some of these rural voters to their side.” Let’s also remember O’Neil’s argument about how if the data you put into an algorithm is biased, you may come out with a skewed outcome. It’s entirely possible that’s where Kriegel’s algorithm was not as foolproof as it was cracked up to be.

 

Trump

In contrast to bringing in-house talent, the Trump campaign enlisted the services of Cambridge Analytica, a data science firm that gains insights into consumer preferences. To do this, they take data from internal sources, as well as public and private sources to determine what they like and what they do not by running analytics. What do they ultimate do with that information? Fortune explained, “CA uses that data to come up with lists of people that a client might want to target to sell a product—or pitch a candidate.” So, how does CA Chief Technology Officer Darren Bolding claim the company applies this philosophy to the political domain?  To him “…[i]n the case of politics, we see this person’s propensity to vote and this is the candidate they are most likely to be interested in…”

 

How did the firm get the job done? Jared Kushner, Trump’s son-in-law, heading up the campaign’s digital arm at the time, brought them in along with their digital expert Brad Parscale. Cambridge Analytica’s head of digital Molly Schweikert, described how, when they entered the campaign, “[i]t became obvious that a sophisticated data apparatus would be needed to combat the years of infrastructure and experience the Clinton campaign had been building up[.]” Not only did they use their own databases but they also relied on data from publishers like Politico, as well as Facebook.

 

While there are still many details about the Trump campaign that are subject to investigation, there is one thing about their digital strategy that is for sure: “Trump’s digital operation was shockingly effective.” Vox reported “Samuel Woolley, who heads the Computational Propaganda project at Oxford’s Internet Institute, found that a disproportionate amount of pro-Trump messaging percolated via automated bots and anti-Hillary propaganda. Trump’s bots, they reported at the time of the election, outnumbered Clinton’s five to one.”

 

The online ads the campaign used with bots on social media adjusted based on the way their targets responded to them. This proved to Cambridge Analytica’s strength in building richer psychological profiles, and the campaign subsequently arranged Trump’s agenda around it: from where he visited and to what type of speech he would give in specific areas. It appears that these types of initiatives made the difference in those few swing states where a few thousand votes decided it all.

 

What’s the lesson in all of this?

To Eitan Hersh, “[i]f there’s one thing we learned in 2016, it’s that there is no such thing as a firewall. There is still a price to be paid for inefficient governance.” What is also true is that currently, in the climate in Washington, this gravitation towards their base is the prominent tactic in between campaigns. That is why, as Hersch describes, many members of Congress view sticking to their partisan principles rather than reaching across the aisle for comprises is the best way to hold onto their seats. And it is possible that they’ve arrived at that conclusion because that is what their data is telling them to do.

 

But, Hersh doesn’t think we should worry yet: “[t]here is still an open political process with market incentives, and when a politician is too misaligned with her district, the market will deliver an alternative in the next election.”

 

While data and the microtargeting it can provide us makes it more efficient to know more about voters than ever before, the voters will still choose the candidate with the message and the substance that most aligns with their priorities at the voting booth.

 

Ready to look at data from all sides?

To learn more about the IE Data Science Bootcamp, download your copy of our informational booklet here. And, if you’re ready to apply for our next intake, get started on your application.