SlideShare a Scribd company logo
ARTIFICIAL INTELLIGENCE FOR
INVESTIGATIVE REPORTING
Using an expert system to enhance
journalists’ ability to discover original public
affairs stories
Meredith Broussard
This paper describes an artificial intelligence-based software system that augments public
affairs reporters’ ability to sort through data and identify investigative storytelling opportuni-
ties. A prototype of the model was developed and was used to analyze education data. The
successful prototype and the social impact of the stories derived from the prototype suggest
this approach as a valid option for newsrooms that seek to tell more compelling, data-rich
stories about public affairs issues.
KEYWORDS artificial intelligence; computational journalism; data journalism; expert
systems; innovation; public affairs journalism
Introduction
“Readers don’t care about bureaucracy,” one of my colleagues tells her students
on the first day of her public affairs journalism class. “To make people care about public
affairs, you have to tell a story that taps into our shared humanity.” The work of telling
routine public affairs stories becomes second nature to a beat reporter. But for an
investigative reporter, storytelling requires an additional layer of cognitive complexity.
The investigative reporter must come up with an original idea—a creative act—and
must then find sources and turn the idea into a narrative. Ideas are easy to generate.
Original ideas are much harder. Original ideas that can turn into successful investigative
stories are even more difficult to create. Once the idea exists, the timeline is uncertain:
investigative stories can take a very long time to conceive and report. Many of today’s
economically challenged newsrooms do not feel they can afford such a luxury. While a
computer cannot generate original story ideas, computational methods for accelerating
human creativity offer a possible solution for newsrooms seeking to amplify their inves-
tigative reporting capacity. This paper describes a model for leveraging artificial intelli-
gence to accelerate the process of discovering investigative ideas on public affairs
beats such as education, transportation, or campaign finance.
Digital Journalism, 2014
http://dx.doi.org/10.1080/21670811.2014.985497
Ó 2014 Taylor & Francis
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
The model, which I call the “Story Discovery Engine,” derives from a type of artifi-
cial intelligence software called an expert system. In this paper, I explain how the
engine works to facilitate the discovery of investigative ideas. First, I outline the con-
ceptual process involved in generating new investigative story ideas. I describe expert
systems, outline one of the logical rules embedded in the software, and show the dif-
ference between a classical expert system and the Story Discovery Engine. I demon-
strate how I tested the system by developing a prototype of the Story Discovery
Engine that analyzed education data from the School District of Philadelphia, the
eighth-largest school district in the United States. That prototype was published online
as a project called “Stacked Up.” Stacked Up consists of a set of investigative stories as
well as a reporting tool made of dynamic, customizable data visualizations inside a nar-
rative framework. I summarize the investigative stories that were produced from the
reporting tool and the policy changes that resulted. The implementation and resulting
investigative news stories, plus the project’s impact, suggest that the Story Discovery
Engine model can add value to investigative reporting.
Creativity and Investigative Story Ideas
Social scientists have engaged with the notion of investigative reporting as a
cultural construction produced inside a particular organizational culture (Gans 2004;
Tuchman 1978). For the purposes of this paper, investigative reporting is defined as a
type of enterprise journalism that is produced over time, outside of the day-to-day
deadline crunch, and includes diverse sources (Hansen 1991). The cognitive process of
coming up with an original investigative story idea is a creative act under Sternberg’s
(1999) definition of creativity as the production of work that is both novel (as in
original) and appropriate (as in useful).
Experienced investigative reporters build up a set of strategies for finding story
ideas, but novice investigative reporters often struggle to find opportunities for novel
enterprise stories. Training and education materials for novice investigators focus on
places to look for stories: follow the money, look at specific lines on financial filings,
and so on.1
The complexity of the process is part of the reason that so much investiga-
tive journalism is reactive, resulting from a tip from a whistle blower, rather than proac-
tive (Protess 1991).
Journalism innovation theorists have suggested that tremendous possibilities exist
in analyzing data to find investigative ideas (Appelgren and Nygren 2014; Dick 2013;
Flaounas et al. 2013; Pavlik 2013). Hamilton and Turner (2009) write that the future of
watchdog journalism may be found in using algorithms (precisely defined problem-
solving procedures) for accountability:
The best algorithms will essentially be public interest data mining. They will generate
leads, hunches, and anomalies to investigate. It will remain for reporters and others
interested in government performance to take the next step of tracking down the story
behind the data pattern.
Tracking down a story in data requires specialized technical skills (to do the data-
crunching) as well as journalistic expertise (to refine the story idea and craft appropriate
prose). These skills until recently have tended to be segregated into different job
2 MEREDITH BROUSSARD
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
categories and experience levels. A novice reporter might have sufficient technical skills
to use pivot tables in a spreadsheet, for example, but might not have sufficient job
experience to know that pivot table analysis could be applied to monthly government
data releases on a particular beat. The promise of computational journalism is that such
walls would be broken down through collaboration and training (Flew et al. 2012). A
successful computational journalism project might thus be described as one that uses
computational thinking to bridge a knowledge gap.
This knowledge gap between the experienced and the novice reporter involves
two types of knowledge: formal and informal (Scribner and Cole 1973). Formal knowl-
edge includes rules of a system, as in knowing the rules of English grammar. An experi-
enced education reporter has formal knowledge of his or her state’s laws and policies
around education. Informal knowledge includes domain expertise and rules of thumb
based on experience. Informal knowledge for an experienced investigative reporter
might include a rule of thumb like this:
If you have a natural disaster like Hurricane Sandy, and there is a big pool of money
for hurricane relief, some of those funds will be misused; after a natural disaster, always
follow up and find out where things went wrong with the government funds, and
you’ll find a story.
To come up with ideas the way an experienced reporter would, the novice reporter
needs the informal knowledge that the experienced reporter has about where to find
stories plus some of the formal knowledge about education policies.
Origin of the Project
In 2011, I found myself staring into exactly this type of knowledge gap. I was an
experienced reporter, but not on the public affairs beat. I wanted to investigate a ques-
tion in education: do Philadelphia public school children have enough books to learn
the material on the state-mandated standardized tests? I had data, I had methods, but I
did not have contacts. I wanted to talk to parents, teachers, and students at the city’s
best schools, and the city’s worst schools, and see if there was a difference in the stu-
dents’ access to books. To do that, I needed to figure out which were the best schools,
and which were the worst schools; I also needed to find people to talk to at each.
There were more than 200 schools. The task was daunting.
Educational data is abundant, but the specific analysis I wanted had not been
done before. It also involved numerous interdependencies and micro-judgments. To
investigate the story I wanted to write, I turned to data journalism.
Data journalism is the practice of finding stories in numbers, and using numbers
to tell stories (Broussard, quoted in Howard 2014). It is an evolving practice (Appelgren
and Nygren 2014) that may also be called data-driven journalism or computational jour-
nalism. Public affairs reporting is particularly suited to data journalism, and specifically
expert system analysis, because public affairs reporting depends on interpreting the
rules of a local system. An education beat reporter must be familiar with a dizzying
array of laws and policies at the federal and state level. Fortunately, these laws and pol-
icies are articulated in text-based rules that are easily available online. The government
uses data to track the success of its programs, and that data is frequently published
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 3
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
online. Other data sets are available to reporters and citizens under the Freedom of
Information Act of 1966. Clearly articulated rules in the real world can be translated
easily into computer logic rules. Applying the rules to the data allows the computa-
tional “intelligence” to uncover social problems.
Thus, the first step was creating a software system that would do some of the
necessary investigative thinking for me. Embedding formal and informal knowledge
into the software would allow me (or any other reporter) to use the software as a
reporting tool to refine story ideas and more efficiently find sources.
This is the essence of the Story Discovery Engine. It is possible to take some of
the experienced reporter’s knowledge, abstract out the high-level rules or heuristics,
and implement these rules inside an expert system in the form of database logic. The
data about the real world is fed in, the logical rules are applied, and the system
presents the reporter with a visual representation of a specific site within the system.
The Prototype
An investigation often arises when a reporter perceives a difference between
what is (the observed reality) and what should be (as articulated in law or policy). A
high-impact investigative story looks at a situation where what is differs from what
should be, and explains why. The reader can then use the narrative to create or enact
a path to remedy the situation.
The idea for Stacked Up arose from just such a difference. “The school is terrific,”
my neighbor said of her daughters’ public school, considered one of the best in the city.
“But if you’re a parent there, you have to be prepared to do a lot of fundraising for basic
things like textbooks.” A few years later, I noticed that I was getting the same email at
the beginning of every semester from the students in my college classes: it said that the
student was very sorry, but he or she could not do the homework because the course
books had not yet arrived in the mail. Those students always seemed to be the students
who received the lowest grades at the end of the semester. It made sense: they could
not do the work required to pass the class if they did not have the books. I wondered:
could book shortages be a factor in Philadelphia public schools’ consistently low stan-
dardized test scores? (Many parents do not have the resources to fundraise to get books
for a school—my neighbor is an outlier, as are many of the other parents at that particu-
lar school.) The District currently has 131,262 students in grades pre-kindergarten
through 12, 87.3 percent of whom are economically disadvantaged. This is a significant
issue because even if parents at each school fundraised, they might not be able to raise
enough money to buy all of the books needed.
Most people would be surprised at the idea that a public school would not have
enough books. After all, Pennsylvania law specifically says that the state provides books.
In Philadelphia, however, students and parents regularly complain of textbook short-
ages. A 10th grader at Parkway West High School told me that students often have to
share books in class and cannot take them home to do homework. Many books are in
poor condition: “There were pictures of testicles drawn on every page,” she said of one
of her ninth-grade books. The logistical challenges of getting multiple books to hun-
dreds of thousands of students at hundreds of schools overwhelm many major school
districts (Labbé and Haynes 2007).
4 MEREDITH BROUSSARD
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
Access to books is particularly critical because a school today is labeled a success
or failure based on students’ performance on high-stakes tests. The tests are highly spe-
cific and are aligned with state educational standards. The tests are also aligned with
the textbooks sold by the three educational publishers that dominate the educational
publishing market. These same publishers design and grade the standardized tests. It
therefore stands to reason that if students do not have the right textbooks, they will
not be able to do well on the tests even if they want to.
Answering the question whether a single school has enough books is complex
because each student in each grade studies at least four subjects every year. Asking if
there are enough books in an entire school district is a massive task. With more than
200 schools, the School District of Philadelphia is the eighth largest school district in
the country. Many of the schools have high student turnover because students switch
schools as they navigate the child welfare or juvenile justice systems (Department of
Human Services, City of Philadelphia 2012). The Children and Youth Division of the
Philadelphia Department of Human Services serves an estimated 20,000 children and
their families each year (Department of Human Services, City of Philadelphia 2014).
This background helped to pose what became the central research question: are
enough books available for Philadelphia students to allow them to prepare adequately
for state-mandated standardized tests?
I designed an algorithm and a database architecture that would let me calculate
the answer to my investigative question. The algorithm is designed to check whether
students are provided with the materials specified in the rules of the educational
system. If they are not, there is likely to be a violation, and there is probably an oppor-
tunity for a story.
Implementing the Prototype
The Story Discovery Engine prototype launched online as a project called
“Stacked Up.” It has two parts: it is both a reporting tool and a presentation system for
the stories I wrote using the reporting tool. The presentation system provides the user
with a set of investigative stories and some explanatory text about the project (see
Figure 1). The reporting tool is a set of dynamic data visualizations that allowed me to
write the investigative stories. The statistics and data that supported each story were
original, derived from the data analysis resulting from the algorithm that forms the
backbone of the project.
In the reporting tool view, the reporter sees a page representing a single school.
The page shows different types of data, organized so that specific types of investigative
questions can be easily answered (see Figure 2). Some such questions include:
 How many students are in each grade in this school?
 Where is the school located in the city?
 How does this school’s test results compare to the rest of the district?
 Do there seem to be enough books for the students enrolled?
The system design anticipates the data points that a reporter needs to write a
data-rich story and presents them in a centralized, easy-to-navigate format. The reporter
leverages their domain expertise, clicks around to adjust some what-ifs to prompt the
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 5
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
FIGURE 1
Presentation system and reporting tool shown on project home page
FIGURE 2
Reporting tool view
6 MEREDITH BROUSSARD
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
creative process, and comes up with a story idea. Because the story idea is targeted, it
immediately becomes easier to identify appropriate sources.
The key is that the software does not try to solve a problem faced by all journal-
ists on every beat. It tries to solve a specific problem on a specific beat, and in the pro-
cess creates a way to solve other problems on that same beat. The Story Discovery
Engine prototype was created and applied to education data, but the model can easily
be applied to other beats as well.
A list of the rules used in the system is beyond the scope of this paper, as is a
depiction of the object model used to represent relationships between the entities
involved; however, additional technical details are available by request. For the sake of
description, however, one of the rules could be explained as follows:
Core_subjects = math, reading, social studies, science.
School_curriculum = a curriculum package published by a major educational pub-
lisher (e.g., “Everyday Math”).
Necessary_material = the minimum books or workbooks necessary to teach the
school’s curriculum package. This often means two items: a textbook and workbook.
For each school in School_District
For each grade in school
For each Core_subject
For each Necessary_material in School_curriculum
If
NumberOf(students_in_grade) = NumberOf(necessary_material)
Then
Enough_materials = yes
Else
Enough_materials = no.
Once the prototype existed, I looked at the data analysis and interviewed people
to validate the findings. I developed hypotheses, reported them out, revised the
hypotheses, and considered story formats as part of a months-long process. As pre-
dicted, the data revealed multiple potential stories about how books were “stacked up”
in Philadelphia city schools.
Theoretical Background
The Story Discovery Engine draws on adjacent, occasionally overlapping concepts
from the fields of communication, cognition, and computation. I will explain each in
turn and how it relates to the Story Discovery Engine. These fields are not generally
placed in dialogue with each other, but there are enormous productive possibilities if
they are put together in conversation.
Computation
The Story Discovery Engine software belongs to a class of artificial intelligence
programs called knowledge-based expert systems. Benfer offers an excellent definition:
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 7
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
Expert systems are computer programs that perform sophisticated tasks once thought
possible only for human experts. If performance were the sole criterion for labelling a
program an expert system, however, many decision support systems, statistical analy-
ses, and spreadsheet programs could be called expert systems. Instead, the term
“expert system” is generally reserved for systems that achieve expert-level performance,
using artificial intelligence programming techniques such as symbolic representation,
inference, and heuristic search (Buchanan 1985). Knowledge-based systems can be dis-
tinguished from other branches of artificial intelligence research by their emphasis on
domain-specific knowledge, rather than more general problem-solving strategies.
Because their strength derives from such domain-specific knowledge rather than more
general problem-solving strategies (Feigenbaum 1977), expert systems are often called
“knowledge-based.” Since the knowledge of experts tends to be domain-specific rather
than general, most expert systems representing this knowledge reflect the specialized
nature of such expertise. (Benfer 1991, 4)
Benfer argues that expert systems can provide an important mechanism for prompting
new social science thinking, and expert system developers can learn from social scien-
tists’ rigorous methods of data collection and validation. He was the first to deploy an
expert system in journalism:
MUckraker, an expert system under development by New Directions in News and the
Investigative Reporters and Editors Association at Missouri University, is a program to
advise investigative reporters on how to approach people for interviews, how to prepare
for those interviews, and how to examine a wide range of public documents in the con-
duct of an investigation. This program is designed to act much as an expert investigative
reporter might, advising the user on strategies to try when sources are reluctant to be
interviewed, pointing out documents that might be relevant to the investigation, and
advising the user on how to organize his or her work. (Benfer 1991, 4)
Under the expert system model Benfer describes, the expert system would deliver to
the reporter “advice” about whether the quantity of books in a school would be the
appropriate basis for a story.
The innovation in the Story Discovery Engine is that instead of advice, the expert
system delivers an interactive data visualization. The data visualization is specifically
designed to answer the most common questions a reporter might ask in order to
assess whether a story might be found at a particular school.
I decided that using the human reporter’s judgment was more efficient than a
computer’s for assessing newsworthiness in this case because the system is designed
to be used in the deadline-driven, time-sensitive environment of a newsroom. The
notion that computer-based quantitative methods should augment humans, not
replace them, is one of the principles of automated text analysis put foward by
Grimmer and Stewart (2013) in their analysis of possible pitfalls in automated content
analysis. In recent years, communication scholars have frequently used the human
workers who participate in Amazon’s Mechanical Turk in order to code content in large
data sets. In the Story Discovery Engine model, the reporter is a similarly essential part
of the system (see Figure 3).
Using the vast “computational” resources of the human brain, the reporter takes
only moments to look at the data revealed by the system, leverage formal and informal
knowledge, and make a judgment about the likelihood of a story. It would require vast
8 MEREDITH BROUSSARD
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
amounts of computing power to get the computer to draw the same conclusions; also,
it would take years to tease out all of the subtleties of human news judgment and
implement them computationally. The human brain thus becomes an efficient part of
the story-generating process, aided and augmented by the computational system.
It is significant that Benfer used social science methods in crafting an expert sys-
tem for journalism. Social science thinking is at the heart of what today we call data
journalism. Meyer pioneered the application of social science methods to journalism in
his 1967 Pulitzer Prize-winning story about race riots in Detroit; those methods were
later codified in Precision Journalism: A Reporter’s Introduction to Social Science Methods
Meyer (2002). Precision journalism methods informed computer-assisted reporting,
which flourished in the 1980s with the advent of desktop computers in the newsroom.
Today’s online data journalists are incubated and organized by the Investigative Report-
FIGURE 3
A classical expert system compared to the Story Discovery Engine
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 9
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
ers and Editors Association through the National Institute for Computer-Assisted Report-
ing, which offers the Phil Meyer Reporting Award for a data-driven project each year.
Three other computation concepts deserve mention: open data, open source, and
big data. Data journalism can only flourish if data sets are available. Structural changes in
the US government have allowed data to be more freely distributed. Influenced in part
by the open data movement, President Barack Obama (2009) released a memorandum
declaring a new openness around data access and availability. “My Administration is
committed to creating an unprecedented level of openness in Government,” it reads.
Information maintained by the Federal Government is a national asset. My Administra-
tion will take appropriate action, consistent with law and policy, to disclose information
rapidly in forms that the public can readily find and use. Executive departments and
agencies should harness new technologies to put information about their operations
and decisions online and readily available to the public. (Obama 2009)
The idea is that citizens can take government data and analyze it to increase transpar-
ency and accountability. The Story Discovery Engine is an intentional system: its analysis
is presented with the intent of increasing government accountability. It is nonpartisan
software, but it proceeds from the assumption that there are problems in the social
system that need to be exposed through the available data.
Open data is often mentioned in conjunction with open source software tools.
Stacked Up was implemented using almost exclusively open source tools. It consists of
43,000 lines of code, all of which are available on an open source version control site
called GitHub. Just like the data it analyzes, the software is publicly available for anyone
to peruse and fact-check. This adds an extra layer of transparency to a transparency-
producing activity.
It is worth mentioning at this point the relationship between software tools and
reporters’ productivity. Several Web-based tools have been developed to help journal-
ists be more efficient at their investigative tasks. Tabula, for example, turns PDFs into
text. One of the most consistent points of conflict between reporters and officials is the
way that the officials provide information. Entire books have been written about the
nuances of negotiating for access to public records (Cuillier 2011; Marburger 2011). A
successful tool for investigative journalism allows reporters to surmount common diffi-
culties that interfere with reporting. Likewise, several data visualization tools have
become popular to use on structured data. Putting census data into a data visualization
tool like Tableau, which displays maps and bubble charts and other forms, allows the
reporter to see patterns that would otherwise be invisible.
A small but growing subset of journalists is comfortable using data to enhance
their abilities to investigate stories. However, those reporters are limited to using the
number of data sets that they, or their newsroom team, can manage. Analyzing one
data set is usually enough for a story. Analyzing two or three data sets and turning
them into a story package requires a team that includes a programmer, designer,
writer, and editor (Domingo 2008; Parasie and Dagiral 2012; Royal 2010).
This is where big data comes in. The next frontier in investigative reporting is
using a computer to analyze multiple data sets at a time.
“Big data” means many things: lots of data (meaning a large quantity of data, as
in terabytes or yottaabytes) or lots of different types of data (meaning a great number
of data sets) (boyd and Crawford 2012). Each is difficult in a newsroom. Newsrooms
10 MEREDITH BROUSSARD
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
tend to have minimal equipment (Domingo 2008), and it is hard to justify to an editor
why a reporter would need thousands of dollars’ worth of specialized equipment to
analyze terabytes of data. It is also hard to crunch a number of data sets in a news-
room because it requires computer-programming expertise. Reporters have to either
develop their own programming skills (which is difficult) or convince an editor to
devote in-house programming expertise to the project (which is also difficult, because
the few programmers in newsrooms tend to be overextended). Resource and personnel
shortages are practical reasons for why big data analysis seldom happens in the
newsroom (Royal 2010).
A software system, properly implemented, can shortcut this long process and can
make more efficient use of limited newsroom developer resources. Stacked Up analyzes
15 data sets, which is more than a typical newsroom can handle given staffing and
time constraints. It took three developers six months to implement, which is more time
than can usually be devoted to a news development project. However, now that the
system architecture exists, the analysis can be replicated in other states or districts in a
matter of days or weeks, not months. The system is based on standardized data, which
(as the name suggests) does not vary significantly. This is consistent with a software
design principle of “write once, run anywhere.” Any newsroom can take the software,
analyze local data, and generate dozens of original investigative stories that matter to
the newsroom’s specific audience. The Story Discovery Engine is a tool to improve pro-
ductivity in both original investigative ideas and sources.
Communication
The project derives from two significant theories about the future of news. The
first is the paradigm proposed by Remler, Waisanen, and Gabor (2013): that collabora-
tive efforts between journalists, programmers, academics, and foundations provide
opportunities for innovation. Stacked Up was created out of a partnership between a
nonprofit journalism organization under the aegis of Temple University’s Center for
Public Interest Journalism (CIPJ) and me, an independent journalist and academic. CPIJ
founded the organization with funding from the William Penn Foundation and the
Wyncote Foundation. The team also looked at best practices developed and publicized
by data journalism organizations. Data teams at ProPublica, the Chicago Tribune, and
the Washington Post all maintain “nerd blogs” that they use to communicate methodol-
ogy behind their data projects; methodologies are also discussed on Source, a data
blog maintained by the Mozilla Foundation.
The other significant theoretical concept behind Stacked Up is the notion of
accountability through algorithm. In “Accountability Through Algorithm: Developing the
Field of Computational Journalism,” Hamilton and Turner (2009) define computational
journalism (of which data journalism is a subset) as: “The combination of algorithms,
data, and knowledge from the social sciences to supplement the accountability
function of journalism.” They write that computational journalism has the potential to
help sustain watchdog reporting because it can “hold leaders accountable, unmask
malfeasance, and make visible critical social trends.”
Accountability through algorithm can mean reverse-engineering an algorithm to
discover how a company used an algorithm to influence the public (Diakopoulos 2013,
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 11
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
2014; Sweeney 2013) or it can mean designing an algorithm that is used to hold
decision-makers accountable. I employ the latter meaning.
Cognition
To understand the cognitive labor-saving dimension of the Story Discovery
Engine model, it is useful to consider the role of creativity in newsroom production.
Reporters use what López-Ortega (2013) calls “deliberate creativity” in order to create
original prose on deadline. Spontaneous creativity, or waiting for inspiration to strike,
does not allow reporters to meet the demands of the job. Reporters employ a set of
creative problem-solving strategies to generate ideas, create interview questions,
observe events, and synthesize this information into prose that conforms to the
appropriate publication style (Gans 2004; Tuchman 1978). Boden writes of the creative
process:
Creativity is a fundamental feature of human intelligence, and a challenge for AI [Artifi-
cial Intelligence]. AI techniques can be used to create new ideas in three ways: by pro-
ducing novel combinations of familiar ideas; by exploring the potential of conceptual
spaces; and by making transformations that enable the generation of previously impos-
sible ideas. (Boden 1998, 347)
Many human beings—including (for example) most professional scientists, artists, and
jazz-musicians—make a justly respected living out of exploratory creativity. That is, they
inherit an accepted style of thinking from their culture, and then search it, and perhaps
superficially tweak it, to explore its contents, boundaries, and potential. But human
beings sometimes transform the accepted conceptual space, by altering or removing
one (or more) of its dimensions, or by adding a new one. Such transformation enables
ideas to be generated which (relative to that conceptual space) were previously impos-
sible. The more fundamental the transformation, and/or the more fundamental the
dimension that is transformed, the more different the newly-possible structures will be.
(Boden 1998, 348)
A computer interface can provide the “fundamental transformation” that Boden calls
for:
It can be said that deliberate creativity is facilitated by objective manipulation of a con-
ceptual space. Also, the iterative process that triggers spontaneous creativity can be
promoted by computer programs that transform repeatedly interim creations, while a
creative subject judges their value. This iterative activity leads to preserve, change,
combine or erase parameters as thought convenient. Therefore, computer-assisted soft-
ware must facilitate both, deliberate and spontaneous creativity. To do so, cognitive
processes associated to creativity, as well as their complex interplay, must be character-
ized properly and then a computational solution can be proposed and implemented.
(López-Ortega 2013, 3460)
A computer-assistance tool to enhance creativity must possess algorithms that help
computing divergent exploration. The outcome of divergent exploration must be
unique ideas. In this sense, a software tool must help overcoming the inherent limits
of the individual for producing divergent solutions. (López-Ortega 2013, 3461)
12 MEREDITH BROUSSARD
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
The Story Discovery Engine helps the individual overcome “inherent limits” because it
analyzes more data sets than an individual could achieve alone. It tests levels of mean-
ing embedded in social rules: if we have ideals of equal access to education, and if we
have a public education system with standards, and if we have state-mandated assess-
ments that measure how well students have met those standards, and if we have
teachers who are provided with the standards, and if we grant that objects (books or
other learning materials) are necessary to practice the material and concepts associated
with the standards: is this an equal system? If not, do we have enough money to make
it equal? If not, what do we do? The rules embedded in the expert system correspond
to the rules articulated in laws and public policies. Ordinarily, only a subject matter
expert would be able to render judgments about whether a scenario is within the law
or not. The Story Discovery Engine makes some of these decisions for the reporter,
freeing the reporter up for higher-level cognitive imaginings.
Findings and Implications for Further Research
I theorized that the Story Discovery Engine model could accelerate the produc-
tion of ideas and stories on a public affairs beat. I prototyped the software and used it
to report on a specific beat. The successful implementation of the project suggests the
Story Discovery Engine model as a valid option for creating impactful news.
The following were among the project’s findings:
 Only a handful of Philadelphia schools seem to have enough books and learn-
ing materials to teach students adequately under the district’s academic
guidelines.
 At least 10 schools appear to have no books at all, others seem to have books
that are wildly out of date, and some seem to have only the books that fit the
curriculum guidelines established by a chief academic officer who left the dis-
trict years ago.
 Despite investing in custom software to track its textbook inventory, the Dis-
trict did not require any of its employees to use the software.
 The District spent $111 million on textbooks between 2008 and 2013. Its
inventory showed more than a million books. Nobody knew where they were;
boxes and boxes of books lay unused and un-catalogued in the basement at
District headquarters.
 The District published a recommended core curriculum, but did not know if
any of its schools were using it. There was no systematic way to determine
whether struggling schools had the books and resources they needed for stu-
dent success.
These findings, once published, were shared extensively on social media and
prompted a number of changes at the School District of Philadelphia. Outcomes in sub-
sequent weeks included:
 One highly paid administrator was found to be responsible for a number of
textbook tracking failures. That administrator retired.
 An internal investigation revealed that several school principals were buying
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 13
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
textbooks from sales representatives with whom they had personal relation-
ships instead of buying the textbooks recommended by the central adminis-
tration. Some of the reps were former school principals. This practice was
eliminated and cost savings were achieved (Jessica Diaz, personal communica-
tion 2013).
 The School District of Philadelphia closed 24 schools at the end of the
2012–2013 school year, displacing approximately 4000 students. Originally, the
District planned to send all the books from the closing schools to the schools
that were slated to receive the students. Instead, the District collected all of
the books from the closing schools at a central location. An attempt was
made to organize the books and reallocate them judiciously.
 An audit was performed so that the central administration was made aware of
the curriculum officially in use at each school.
 Several local news organizations picked up the investigative stories and
re-published them on their own websites, amplifying the audience for the
stories.
This modest impact suggests that the reporting could be duplicated in other
large cities like Philadelphia, all of which struggle with similar logistical issues around
public education resources.
The Story Discovery Engine model also solves a particular logistical issue that
newsrooms struggle with. A newsroom depends on specialized labor. The writers are
good at writing, the editors are good at editing, the Web producers are good at the
nuances of the content management system, and the programmers are good at writing
programs. It makes sense to have the programmers write the code that teases out the
facts the reporters need to write stories. Getting the reporters to write high-level code
is less practical. However, few newsrooms have the staff that would be required to
write high-level code (McChesney 2012; Parasie and Dagiral 2012; Royal 2010). Writing
code is difficult. Royal writes that the more experience a reporter has, the more they
tend to appreciate the complexity of data journalism:
Experience is correlated to the perceived level of difficulty of working with data jour-
nalism for journalists in general. In this case, the more experience the journalist has,
the more likely he or she is to agree that data journalism is difficult for most journal-
ists. This might indicate that the journalists with some or extensive data journalism
experience tend to value this expertise as unique and a skill that not everyone can
master. (Royal 2010)
Despite the enthusiasm for data journalism, the logistics of performing data journalism
have proved formidable for many news organizations.
Creating a Story Discovery Engine for a metropolitan area, then opening it up to
the public, allows more people to leverage the code to write stories. The engine could
also be implemented by a foundation and opened up to the public; the local press
could use it to write stories without having to fund the development or hire and man-
age a software staff.
A number of story prompts arose over the course of reporting for Stacked Up.
Any of the prompts could be used as prompts to write education beat stories in any
district in the United States. Some prompts include:
14 MEREDITH BROUSSARD
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
- Some schools with active home–school associations fundraise for basic school supplies
like paper. Find a school that is fundraising for money for books or paper using social
media. Use Stacked Up to check whether the school seems to have enough books.
Explore a few scenarios:
- The school may be trying something new and interesting with its curriculum,
and the home–school association is trying to raise money to support it.
- The school was not allocated enough money to buy books for its students.
- The school was allocated enough money for books, but the money went to
something else.
- Additional scenarios not mentioned here.
- Use Stacked Up to find a school that seems not to have books. Arrange a visit and
ask to see the book storage room. Are any of the “missing” books sitting in the storage
area? If so, why?
- A school is known to have a one-to-one laptop program where each student receives
a school-issued laptop. The school still uses printed textbooks in addition to the lap-
tops, but uses fewer textbooks. What happened to the books that were in the school
when the laptop program began? Were they redistributed to other students? If not,
where did they go?
- Every time state education standards change, every school needs to buy new books
to match the new standards. When did your state last update its standards?
- Who were the politicians on the committee that made the standards change? Is there
anything intriguing in their campaign donations?
- Districts have guidelines for how long textbooks should stay in use. Generally, a text-
book lasts about five years. What happens to books after they are used for five years?
Are they recycled, or is there a depository?
- In Detroit, the book depository became a dumping ground (Dawsey 2008; Griffioen
2008). What is happening to old books in your city?
- When schools do not have enough books, teachers often compensate by making pho-
tocopies. Find a school that lacks books, and check how much they spend on photo-
copies. Is this an efficient economic choice?
- Some schools claim they have replaced print textbooks with digital textbooks. Digital
textbooks are password-protected. People regularly lose passwords and get locked out
of password-protected systems. Are kids and parents able to get to the digital text-
books when they need them?
- Use Stacked Up to find a school that is using social studies textbooks that are more
than five or eight years old. How do they teach civics or social studies with books that
do not include the name of the current US President?
These 10 ideas took me about 30 minutes to generate. Each of them could prob-
ably result in a series of at least three stories, plus two follow-up stories based on the
school district’s reaction. That is 50 original investigative stories, an entire year’s worth
of stories for a reporter writing one story a week. An interested reader will probably
generate additional questions while reading the story prompts; each of those questions
might produce five original investigative stories as well. The potential pool of story
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 15
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
ideas could multiply if given an entire newsroom of people practiced deliberate
creativity.
Having a virtual fountain of story ideas is especially useful for the modern news-
room, where online publishing means that reporters and editors need to “feed the
beast” almost constantly. Writing only one story a week is a luxury in today’s market-
place, especially at online publications where writers are urged to publish multiple sto-
ries a day and editors may edit 30–40 stories a week (June 2013; Peters 2010).
High-impact investigative stories can take a tremendous amount of time to con-
ceive and report, a timeline that is the opposite of the current market imperative. A
software tool to accelerate the investigative process can add significant value to the
newsroom.
NOTE
1. Books such as The Investigative Reporter’s Handbook (Houston and Investigative
Reporters and Editors, Inc. 2009) offer readers a set of places to look for stories
inside different beats such as education, transportation, or nonprofits. Likewise,
Investigative Reporters and Editors, Inc., the nonprofit formed in 1975 to help
“improve the quality of investigative reporting,” focuses significant educational
efforts on strategies to help reporters find story ideas: a February 2014 electronic
search of the Investigative Reporters and Editors library includes 127 tipsheets for
the search query “investigative story ideas.”
REFERENCES
Appelgren, Ester, and Gunnar Nygren. 2014, February. “Data Journalism in Sweden: Introduc-
ing New Methods and Genres of Journalism into ‘Old’ Organizations.” Digital Journal-
ism: 1–12. doi:10.1080/21670811.2014.884344.
Benfer, Robert Alfred. 1991. Expert Systems. Sage University Papers Series, no. 07-077. Newbury
Park, Calif: Sage. http://dx.doi.org/10.4135/9781412984225.
Boden, Margaret A. 1998. “Creativity and Artificial Intelligence.” Artificial Intelligence 103:
347–356.
boyd, danah, and Kate Crawford. 2012. “Critical Questions for Big Data: Provocations for a
Cultural, Technological and Scholarly Phenomenon.” Information, Communication 
Society 15 (5): 662–679. doi:10.1080/1369118X.2012.678878.
Buchanan, Bruce G. 1985. “Expert systems.” Journal of Automated Reasoning 1 (1): 28–35.
Cuillier, David. 2011. The Art of Access: Strategies for Acquiring Public Records. Washington,
DC: CQ Press.
Dawsey, Chastity Pratt. 2008. “Unsecured Schools given up to Thieves, Vandals.” Detroit Free
Press, April 4. http://www.freep.com/apps/pbcs.dll/article?AID=/20080404/NEWS01/
804040302.
Department of Human Services, City of Philadelphia. 2012. 2011 Annual Report. Annual
Report. http://www.phila.gov/dhs/pdfs/DHS%20Annual%20report.pdf.
Department of Human Services, City of Philadelphia. 2014. “Children and Youth Division Home
Page.” http://dhs.phila.gov/intranet/pgintrahome_pub.nsf/content/cydhomepage.
16 MEREDITH BROUSSARD
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
Diakopoulos, Nicholas. 2013. “Rage against the Algorithms.” The Atlantic, October 3. http://
www.theatlantic.com/technology/archive/2013/10/rage-against-the-algorithms/280255/.
Diakopoulos, Nicholas. 2014. “Algorithmic Accountability Reporting: On the Investigation of
Black Boxes”. Tow Center for Digital Journalism at Columbia University. http://towcen
ter.org/wp-content/uploads/2014/02/78524_Tow-Center-Report-WEB-1.pdf.
Diaz, Jessica. 2013. Personal Communication.
Dick, Murray. 2013, September. “Interactive Infographics and News Values.” Digital Journal-
ism: 1–17. doi:10.1080/21670811.2013.841368.
Domingo, David. 2008. “Interactivity in the Daily Routines of Online Newsrooms: Dealing
with an Uncomfortable Myth.” Journal of Computer-Mediated Communication 13 (3):
680–704. doi:10.1111/j.1083-6101.2008.00415.x.
Feigenbaum, E.A. 1977. “The Art of Artificial Intelligence: Themes and Case Studies of Knowl-
edge Engineering.” Proceedings UCAI 5. Cambridge, MA.
Flaounas, Ilias, Omar Ali, Thomas Lansdall-Welfare, Tijl De Bie, Nick Mosdell, Justin Lewis, and
Nello Cristianini. 2013. “Research Methods in the Age of Digital Journalism: Massive-
Scale Automated Analysis of News-Content—Topics, Style and Gender.” Digital
Journalism 1 (1): 102–116. doi:10.1080/21670811.2012.714928.
Flew, Terry, Christina Spurgeon, Anna Daniel, and Adam Swift. 2012. “The Promise of Com-
putational Journalism.” Journalism Practice 6 (2): 157–171. doi:10.1080/17512786.2011.
616655.
Gans, Herbert J. 2004. Deciding What’s News: A Study of CBS Evening News, NBC Nightly News,
Newsweek and Time / Herbert J. Gans. Visions of the American Press. Evanston, Ill:
Northwestern University Press.
Griffioen, James D. 2008. “The Knowledge of What Happened and What Will.” Sweet Juniper.
http://www.sweet-juniper.com/2008/04/knowledge-of-what-happened-and-what.html.
Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of
Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3):
267–297. doi:10.1093/pan/mps028.
Hamilton, James T., and Fred Turner. 2009. Accountability through Algorithm: Developing the
Field of Computational Journalism. Developing the Field of Computational Journalism.
Center For Advanced Study in the Behavioral Sciences Summer Workshop: Stanford
University. http://www.stanford.edu/~fturner/Hamilton%20Turner%20Acc%20by%20Alg
%20Final.pdf.
Hansen, Kathleen A. 1991. “Source Diversity and Newspaper Enterprise Journalism.” Journalism
 Mass Communication Quarterly 68 (3): 474–482. doi:10.1177/107769909106800318.
Houston, Brant and Investigative Reporters and Editors, Inc. 2009. The Investigative Reporter’s
Handbook: A Guide to Documents, Databases and Techniques. 5th ed. , edited by Brant
Houston, Investigative Reporters and Editors, Inc. Boston, MA: Bedford/St. Martin’s.
Howard, Alexander Benjamin. 2014. The Art  Science of Data-Driven Journalism. Tow/Knight
Reports. Tow Center for Digital Journalism: Columbia University. http://towcenter.org/
wp-content/uploads/2014/05/Tow-Center-Data-Driven-Journalism.pdf.
June, Laura. 2013. “Maura Johnston on Why She Opened Her IPad-Only Magazine to the
Web.” The Verge, July 10. http://www.theverge.com/2013/7/10/4506824/maura-john
ston-on-why-she-opened-her-ipad-only-magazine-to-the-web.
Labbé, Theola and Dion Haynes, V. 2007. “Rhee Blasts Textbook Process for Letting Supplies
Languish.” The Washington Post, August 4. http://www.washingtonpost.com/wp-dyn/
content/article/2007/08/03/AR2007080302134_pf.html.
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 17
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014
López-Ortega, Omar. 2013. “Computer-Assisted Creativity: Emulation of Cognitive Processes
on a Multi-Agent System.” Expert Systems with Applications 40 (9): 3459–3470.
doi:10.1016/j.eswa.2012.12.054.
Marburger, David. 2011. Access with Attitude: An Advocate’s Guide to Freedom of Information
in Ohio. Athens: Ohio University Press.
McChesney, Robert W. 2012. “Farewell to Journalism?: Time for a Rethinking.” Journalism
Practice 6 (5–6): 614–626. doi:10.1080/17512786.2012.683273.
Meyer, Philip. 2002. Precision Journalism: A Reporter’s Introduction to Social Science Methods.
4th ed. Lanham, Md: Rowman  Littlefield.
Obama, Barack. 2009. “Memorandum for the Heads of Executive Departments and Agencies
Re: Transparency and Open Government”. Federal Register. http://www.whitehouse.
gov/the_press_office/TransparencyandOpenGovernment.
Parasie, S., and E. Dagiral. 2012. “Data-Driven Journalism and the Public Good: ‘Computer-
Assisted-Reporters’ and ‘Programmer-Journalists’ in Chicago.” New Media  Society 15
(6): 853–871. doi:10.1177/1461444812463345.
Pavlik, John V. 2013. “Innovation and the Future of Journalism.” Digital Journalism 1 (2):
181–193. doi:10.1080/21670811.2012.756666.
Peters, Jeremy W. 2010. “In a World of Online News, Burnout Starts Younger.” The New York
Times, July 18. http://www.nytimes.com/2010/07/19/business/media/19press.html.
Protess, David. 1991. The Journalism of Outrage: Investigative Reporting and Agenda Building
in America. New York: Guilford Press.
Remler, Dahlia K., Don J. Waisanen, and Andrea Gabor. 2013. “Academic Journalism: A Modest
Proposal.” Journalism Studies, August, 1–17. doi:10.1080/1461670X.2013.821321.
Royal, Cindy. 2010. “The Journalist as Programmer: A Case Study of the New York times
Interactive News Technology Department.” The University of Texas at Austin. https://
online.journalism.utexas.edu/2010/papers/Royal10.pdf.
Scribner, S., and M. Cole. 1973. “Cognitive Consequences of Formal and Informal Education:
New Accommodations Are Needed between School-Based Learning and Learning
Experiences of Everyday Life.” Science 182 (4112): 553–559. doi:10.1126/sci-
ence.182.4112.553.
Sternberg, Robert J., ed. 1999. Handbook of Creativity. Cambridge, U.K. ; New York:
Cambridge University Press.
Sweeney, Latanya. 2013. “Discrimination in Online Ad Delivery.” Communications of the ACM
56 (5): 44. doi:10.1145/2447976.2447990.
Tuchman, Gaye. 1978. Making News: A Study in the Construction of Reality. New York: Free
Press.
Meredith Broussard, Department of Journalism, Temple University, USA. E-mail:
merbroussard@temple.edu. Web: http://meredithbroussard.com
18 MEREDITH BROUSSARD
Downloaded
by
[Temple
University
Libraries]
at
08:32
15
December
2014

More Related Content

Similar to Artificial Intelligence For Investigative Reporting

The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
Liliana Bounegru
 
1)  Your Research Project on the surveillance state consists of .docx
1)  Your Research Project on the surveillance state consists of .docx1)  Your Research Project on the surveillance state consists of .docx
1)  Your Research Project on the surveillance state consists of .docx
croftsshanon
 
Data Journalism: chapter from Online Journalism Handbook first edition
Data Journalism: chapter from Online Journalism Handbook first editionData Journalism: chapter from Online Journalism Handbook first edition
Data Journalism: chapter from Online Journalism Handbook first edition
Paul Bradshaw
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
EngrAliSarfrazSiddiq
 
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
Liliana Bounegru
 
FINALProject Report for public
FINALProject Report for publicFINALProject Report for public
FINALProject Report for publicHina Pandya
 
International Trade Essay
International Trade EssayInternational Trade Essay
International Trade Essay
Jenny Jackson
 
Scraping the Social? Issues in real-time social research (Departmental Semina...
Scraping the Social? Issues in real-time social research (Departmental Semina...Scraping the Social? Issues in real-time social research (Departmental Semina...
Scraping the Social? Issues in real-time social research (Departmental Semina...
Sociology@Essex
 
Computational Journalism
Computational JournalismComputational Journalism
Computational Journalism
ADGP, Public Grivences, Bangalore
 
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...
caijjournal
 
2053951715611145
20539517156111452053951715611145
2053951715611145
Firas Husseini
 
Big data analysis of news and social media content
Big data analysis of news and social media contentBig data analysis of news and social media content
Big data analysis of news and social media content
Firas Husseini
 
Information disorder: Toward an interdisciplinary framework for research and ...
Information disorder: Toward an interdisciplinary framework for research and ...Information disorder: Toward an interdisciplinary framework for research and ...
Information disorder: Toward an interdisciplinary framework for research and ...
friendscb
 
Social Media Influence Analysis using Data Science Techniques
Social Media Influence Analysis using Data Science TechniquesSocial Media Influence Analysis using Data Science Techniques
Social Media Influence Analysis using Data Science Techniques
Muhammad Bilal
 
Media literacy in the age of information overload
Media literacy in the age of information overloadMedia literacy in the age of information overload
Media literacy in the age of information overload
Gmeconline
 
Argument Essay Outline Example.pdf
Argument Essay Outline Example.pdfArgument Essay Outline Example.pdf
Argument Essay Outline Example.pdf
Kathleen Harvey
 
Media Shifts
Media ShiftsMedia Shifts
Media Shifts
Julia Goldberg
 
The power of Structured Journalism & Hacker Culture in NPR
The power of Structured Journalism & Hacker Culture in NPRThe power of Structured Journalism & Hacker Culture in NPR
The power of Structured Journalism & Hacker Culture in NPR
Poderomedia
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
The Higher Education Academy
 

Similar to Artificial Intelligence For Investigative Reporting (20)

The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
 
1)  Your Research Project on the surveillance state consists of .docx
1)  Your Research Project on the surveillance state consists of .docx1)  Your Research Project on the surveillance state consists of .docx
1)  Your Research Project on the surveillance state consists of .docx
 
Data Journalism: chapter from Online Journalism Handbook first edition
Data Journalism: chapter from Online Journalism Handbook first editionData Journalism: chapter from Online Journalism Handbook first edition
Data Journalism: chapter from Online Journalism Handbook first edition
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
 
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
 
FINALProject Report for public
FINALProject Report for publicFINALProject Report for public
FINALProject Report for public
 
International Trade Essay
International Trade EssayInternational Trade Essay
International Trade Essay
 
Scraping the Social? Issues in real-time social research (Departmental Semina...
Scraping the Social? Issues in real-time social research (Departmental Semina...Scraping the Social? Issues in real-time social research (Departmental Semina...
Scraping the Social? Issues in real-time social research (Departmental Semina...
 
Computational Journalism
Computational JournalismComputational Journalism
Computational Journalism
 
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...
 
2053951715611145
20539517156111452053951715611145
2053951715611145
 
Big data analysis of news and social media content
Big data analysis of news and social media contentBig data analysis of news and social media content
Big data analysis of news and social media content
 
Information disorder: Toward an interdisciplinary framework for research and ...
Information disorder: Toward an interdisciplinary framework for research and ...Information disorder: Toward an interdisciplinary framework for research and ...
Information disorder: Toward an interdisciplinary framework for research and ...
 
Social Media Influence Analysis using Data Science Techniques
Social Media Influence Analysis using Data Science TechniquesSocial Media Influence Analysis using Data Science Techniques
Social Media Influence Analysis using Data Science Techniques
 
Information is knowledge
Information is knowledgeInformation is knowledge
Information is knowledge
 
Media literacy in the age of information overload
Media literacy in the age of information overloadMedia literacy in the age of information overload
Media literacy in the age of information overload
 
Argument Essay Outline Example.pdf
Argument Essay Outline Example.pdfArgument Essay Outline Example.pdf
Argument Essay Outline Example.pdf
 
Media Shifts
Media ShiftsMedia Shifts
Media Shifts
 
The power of Structured Journalism & Hacker Culture in NPR
The power of Structured Journalism & Hacker Culture in NPRThe power of Structured Journalism & Hacker Culture in NPR
The power of Structured Journalism & Hacker Culture in NPR
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
 

More from Jennifer Strong

Step By Step How To Write A
Step By Step How To Write AStep By Step How To Write A
Step By Step How To Write A
Jennifer Strong
 
Scholarship Personal Statement What To Includ
Scholarship Personal Statement What To IncludScholarship Personal Statement What To Includ
Scholarship Personal Statement What To Includ
Jennifer Strong
 
Essay Purposes, Types And Examples Examples
Essay Purposes, Types And Examples ExamplesEssay Purposes, Types And Examples Examples
Essay Purposes, Types And Examples Examples
Jennifer Strong
 
Someone To Write My Essay For Me - College Homewor
Someone To Write My Essay For Me - College HomeworSomeone To Write My Essay For Me - College Homewor
Someone To Write My Essay For Me - College Homewor
Jennifer Strong
 
Effective Persuasive Writing. Persuasive Essay Topics F
Effective Persuasive Writing. Persuasive Essay Topics FEffective Persuasive Writing. Persuasive Essay Topics F
Effective Persuasive Writing. Persuasive Essay Topics F
Jennifer Strong
 
002 Essay Example About Plagiarism C
002 Essay Example About Plagiarism C002 Essay Example About Plagiarism C
002 Essay Example About Plagiarism C
Jennifer Strong
 
15 Rhetorical Analysis Questions To Ask Your Stu
15 Rhetorical Analysis Questions To Ask Your Stu15 Rhetorical Analysis Questions To Ask Your Stu
15 Rhetorical Analysis Questions To Ask Your Stu
Jennifer Strong
 
Basildon Bond Watermarked Pe
Basildon Bond Watermarked PeBasildon Bond Watermarked Pe
Basildon Bond Watermarked Pe
Jennifer Strong
 
Admission Essay How To Write A Good Introductory Paragraph For An Essay
Admission Essay How To Write A Good Introductory Paragraph For An EssayAdmission Essay How To Write A Good Introductory Paragraph For An Essay
Admission Essay How To Write A Good Introductory Paragraph For An Essay
Jennifer Strong
 
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst EvolveFluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
Jennifer Strong
 
Academic Conclusion. Conclusion Paragraphs. 20
Academic Conclusion. Conclusion Paragraphs. 20Academic Conclusion. Conclusion Paragraphs. 20
Academic Conclusion. Conclusion Paragraphs. 20
Jennifer Strong
 
The Best Argumentative Essay Topics. 100 Argumentative Essay Topics
The Best Argumentative Essay Topics. 100 Argumentative Essay TopicsThe Best Argumentative Essay Topics. 100 Argumentative Essay Topics
The Best Argumentative Essay Topics. 100 Argumentative Essay Topics
Jennifer Strong
 
Writing A Thesis Statement For Resea
Writing A Thesis Statement For ReseaWriting A Thesis Statement For Resea
Writing A Thesis Statement For Resea
Jennifer Strong
 
The Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication EssayThe Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication Essay
Jennifer Strong
 
Pmi Charleston Scholarship Essay
Pmi Charleston Scholarship EssayPmi Charleston Scholarship Essay
Pmi Charleston Scholarship Essay
Jennifer Strong
 
Printable Writing Paper (75) By Aimee-Valentine-Art.De
Printable Writing Paper (75) By Aimee-Valentine-Art.DePrintable Writing Paper (75) By Aimee-Valentine-Art.De
Printable Writing Paper (75) By Aimee-Valentine-Art.De
Jennifer Strong
 
Descriptive Essay Topics
Descriptive Essay TopicsDescriptive Essay Topics
Descriptive Essay Topics
Jennifer Strong
 
Paper Writers For Hire By Ac89Pen
Paper Writers For Hire By Ac89PenPaper Writers For Hire By Ac89Pen
Paper Writers For Hire By Ac89Pen
Jennifer Strong
 
Literary Narrative Essay Telegraph
Literary Narrative Essay  TelegraphLiterary Narrative Essay  Telegraph
Literary Narrative Essay Telegraph
Jennifer Strong
 
Greatest Free Essay HttpsFreeessays.Page
Greatest Free Essay HttpsFreeessays.PageGreatest Free Essay HttpsFreeessays.Page
Greatest Free Essay HttpsFreeessays.Page
Jennifer Strong
 

More from Jennifer Strong (20)

Step By Step How To Write A
Step By Step How To Write AStep By Step How To Write A
Step By Step How To Write A
 
Scholarship Personal Statement What To Includ
Scholarship Personal Statement What To IncludScholarship Personal Statement What To Includ
Scholarship Personal Statement What To Includ
 
Essay Purposes, Types And Examples Examples
Essay Purposes, Types And Examples ExamplesEssay Purposes, Types And Examples Examples
Essay Purposes, Types And Examples Examples
 
Someone To Write My Essay For Me - College Homewor
Someone To Write My Essay For Me - College HomeworSomeone To Write My Essay For Me - College Homewor
Someone To Write My Essay For Me - College Homewor
 
Effective Persuasive Writing. Persuasive Essay Topics F
Effective Persuasive Writing. Persuasive Essay Topics FEffective Persuasive Writing. Persuasive Essay Topics F
Effective Persuasive Writing. Persuasive Essay Topics F
 
002 Essay Example About Plagiarism C
002 Essay Example About Plagiarism C002 Essay Example About Plagiarism C
002 Essay Example About Plagiarism C
 
15 Rhetorical Analysis Questions To Ask Your Stu
15 Rhetorical Analysis Questions To Ask Your Stu15 Rhetorical Analysis Questions To Ask Your Stu
15 Rhetorical Analysis Questions To Ask Your Stu
 
Basildon Bond Watermarked Pe
Basildon Bond Watermarked PeBasildon Bond Watermarked Pe
Basildon Bond Watermarked Pe
 
Admission Essay How To Write A Good Introductory Paragraph For An Essay
Admission Essay How To Write A Good Introductory Paragraph For An EssayAdmission Essay How To Write A Good Introductory Paragraph For An Essay
Admission Essay How To Write A Good Introductory Paragraph For An Essay
 
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst EvolveFluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
 
Academic Conclusion. Conclusion Paragraphs. 20
Academic Conclusion. Conclusion Paragraphs. 20Academic Conclusion. Conclusion Paragraphs. 20
Academic Conclusion. Conclusion Paragraphs. 20
 
The Best Argumentative Essay Topics. 100 Argumentative Essay Topics
The Best Argumentative Essay Topics. 100 Argumentative Essay TopicsThe Best Argumentative Essay Topics. 100 Argumentative Essay Topics
The Best Argumentative Essay Topics. 100 Argumentative Essay Topics
 
Writing A Thesis Statement For Resea
Writing A Thesis Statement For ReseaWriting A Thesis Statement For Resea
Writing A Thesis Statement For Resea
 
The Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication EssayThe Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication Essay
 
Pmi Charleston Scholarship Essay
Pmi Charleston Scholarship EssayPmi Charleston Scholarship Essay
Pmi Charleston Scholarship Essay
 
Printable Writing Paper (75) By Aimee-Valentine-Art.De
Printable Writing Paper (75) By Aimee-Valentine-Art.DePrintable Writing Paper (75) By Aimee-Valentine-Art.De
Printable Writing Paper (75) By Aimee-Valentine-Art.De
 
Descriptive Essay Topics
Descriptive Essay TopicsDescriptive Essay Topics
Descriptive Essay Topics
 
Paper Writers For Hire By Ac89Pen
Paper Writers For Hire By Ac89PenPaper Writers For Hire By Ac89Pen
Paper Writers For Hire By Ac89Pen
 
Literary Narrative Essay Telegraph
Literary Narrative Essay  TelegraphLiterary Narrative Essay  Telegraph
Literary Narrative Essay Telegraph
 
Greatest Free Essay HttpsFreeessays.Page
Greatest Free Essay HttpsFreeessays.PageGreatest Free Essay HttpsFreeessays.Page
Greatest Free Essay HttpsFreeessays.Page
 

Recently uploaded

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 

Recently uploaded (20)

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 

Artificial Intelligence For Investigative Reporting

  • 1. ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING Using an expert system to enhance journalists’ ability to discover original public affairs stories Meredith Broussard This paper describes an artificial intelligence-based software system that augments public affairs reporters’ ability to sort through data and identify investigative storytelling opportuni- ties. A prototype of the model was developed and was used to analyze education data. The successful prototype and the social impact of the stories derived from the prototype suggest this approach as a valid option for newsrooms that seek to tell more compelling, data-rich stories about public affairs issues. KEYWORDS artificial intelligence; computational journalism; data journalism; expert systems; innovation; public affairs journalism Introduction “Readers don’t care about bureaucracy,” one of my colleagues tells her students on the first day of her public affairs journalism class. “To make people care about public affairs, you have to tell a story that taps into our shared humanity.” The work of telling routine public affairs stories becomes second nature to a beat reporter. But for an investigative reporter, storytelling requires an additional layer of cognitive complexity. The investigative reporter must come up with an original idea—a creative act—and must then find sources and turn the idea into a narrative. Ideas are easy to generate. Original ideas are much harder. Original ideas that can turn into successful investigative stories are even more difficult to create. Once the idea exists, the timeline is uncertain: investigative stories can take a very long time to conceive and report. Many of today’s economically challenged newsrooms do not feel they can afford such a luxury. While a computer cannot generate original story ideas, computational methods for accelerating human creativity offer a possible solution for newsrooms seeking to amplify their inves- tigative reporting capacity. This paper describes a model for leveraging artificial intelli- gence to accelerate the process of discovering investigative ideas on public affairs beats such as education, transportation, or campaign finance. Digital Journalism, 2014 http://dx.doi.org/10.1080/21670811.2014.985497 Ó 2014 Taylor & Francis Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 2. The model, which I call the “Story Discovery Engine,” derives from a type of artifi- cial intelligence software called an expert system. In this paper, I explain how the engine works to facilitate the discovery of investigative ideas. First, I outline the con- ceptual process involved in generating new investigative story ideas. I describe expert systems, outline one of the logical rules embedded in the software, and show the dif- ference between a classical expert system and the Story Discovery Engine. I demon- strate how I tested the system by developing a prototype of the Story Discovery Engine that analyzed education data from the School District of Philadelphia, the eighth-largest school district in the United States. That prototype was published online as a project called “Stacked Up.” Stacked Up consists of a set of investigative stories as well as a reporting tool made of dynamic, customizable data visualizations inside a nar- rative framework. I summarize the investigative stories that were produced from the reporting tool and the policy changes that resulted. The implementation and resulting investigative news stories, plus the project’s impact, suggest that the Story Discovery Engine model can add value to investigative reporting. Creativity and Investigative Story Ideas Social scientists have engaged with the notion of investigative reporting as a cultural construction produced inside a particular organizational culture (Gans 2004; Tuchman 1978). For the purposes of this paper, investigative reporting is defined as a type of enterprise journalism that is produced over time, outside of the day-to-day deadline crunch, and includes diverse sources (Hansen 1991). The cognitive process of coming up with an original investigative story idea is a creative act under Sternberg’s (1999) definition of creativity as the production of work that is both novel (as in original) and appropriate (as in useful). Experienced investigative reporters build up a set of strategies for finding story ideas, but novice investigative reporters often struggle to find opportunities for novel enterprise stories. Training and education materials for novice investigators focus on places to look for stories: follow the money, look at specific lines on financial filings, and so on.1 The complexity of the process is part of the reason that so much investiga- tive journalism is reactive, resulting from a tip from a whistle blower, rather than proac- tive (Protess 1991). Journalism innovation theorists have suggested that tremendous possibilities exist in analyzing data to find investigative ideas (Appelgren and Nygren 2014; Dick 2013; Flaounas et al. 2013; Pavlik 2013). Hamilton and Turner (2009) write that the future of watchdog journalism may be found in using algorithms (precisely defined problem- solving procedures) for accountability: The best algorithms will essentially be public interest data mining. They will generate leads, hunches, and anomalies to investigate. It will remain for reporters and others interested in government performance to take the next step of tracking down the story behind the data pattern. Tracking down a story in data requires specialized technical skills (to do the data- crunching) as well as journalistic expertise (to refine the story idea and craft appropriate prose). These skills until recently have tended to be segregated into different job 2 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 3. categories and experience levels. A novice reporter might have sufficient technical skills to use pivot tables in a spreadsheet, for example, but might not have sufficient job experience to know that pivot table analysis could be applied to monthly government data releases on a particular beat. The promise of computational journalism is that such walls would be broken down through collaboration and training (Flew et al. 2012). A successful computational journalism project might thus be described as one that uses computational thinking to bridge a knowledge gap. This knowledge gap between the experienced and the novice reporter involves two types of knowledge: formal and informal (Scribner and Cole 1973). Formal knowl- edge includes rules of a system, as in knowing the rules of English grammar. An experi- enced education reporter has formal knowledge of his or her state’s laws and policies around education. Informal knowledge includes domain expertise and rules of thumb based on experience. Informal knowledge for an experienced investigative reporter might include a rule of thumb like this: If you have a natural disaster like Hurricane Sandy, and there is a big pool of money for hurricane relief, some of those funds will be misused; after a natural disaster, always follow up and find out where things went wrong with the government funds, and you’ll find a story. To come up with ideas the way an experienced reporter would, the novice reporter needs the informal knowledge that the experienced reporter has about where to find stories plus some of the formal knowledge about education policies. Origin of the Project In 2011, I found myself staring into exactly this type of knowledge gap. I was an experienced reporter, but not on the public affairs beat. I wanted to investigate a ques- tion in education: do Philadelphia public school children have enough books to learn the material on the state-mandated standardized tests? I had data, I had methods, but I did not have contacts. I wanted to talk to parents, teachers, and students at the city’s best schools, and the city’s worst schools, and see if there was a difference in the stu- dents’ access to books. To do that, I needed to figure out which were the best schools, and which were the worst schools; I also needed to find people to talk to at each. There were more than 200 schools. The task was daunting. Educational data is abundant, but the specific analysis I wanted had not been done before. It also involved numerous interdependencies and micro-judgments. To investigate the story I wanted to write, I turned to data journalism. Data journalism is the practice of finding stories in numbers, and using numbers to tell stories (Broussard, quoted in Howard 2014). It is an evolving practice (Appelgren and Nygren 2014) that may also be called data-driven journalism or computational jour- nalism. Public affairs reporting is particularly suited to data journalism, and specifically expert system analysis, because public affairs reporting depends on interpreting the rules of a local system. An education beat reporter must be familiar with a dizzying array of laws and policies at the federal and state level. Fortunately, these laws and pol- icies are articulated in text-based rules that are easily available online. The government uses data to track the success of its programs, and that data is frequently published ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 3 Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 4. online. Other data sets are available to reporters and citizens under the Freedom of Information Act of 1966. Clearly articulated rules in the real world can be translated easily into computer logic rules. Applying the rules to the data allows the computa- tional “intelligence” to uncover social problems. Thus, the first step was creating a software system that would do some of the necessary investigative thinking for me. Embedding formal and informal knowledge into the software would allow me (or any other reporter) to use the software as a reporting tool to refine story ideas and more efficiently find sources. This is the essence of the Story Discovery Engine. It is possible to take some of the experienced reporter’s knowledge, abstract out the high-level rules or heuristics, and implement these rules inside an expert system in the form of database logic. The data about the real world is fed in, the logical rules are applied, and the system presents the reporter with a visual representation of a specific site within the system. The Prototype An investigation often arises when a reporter perceives a difference between what is (the observed reality) and what should be (as articulated in law or policy). A high-impact investigative story looks at a situation where what is differs from what should be, and explains why. The reader can then use the narrative to create or enact a path to remedy the situation. The idea for Stacked Up arose from just such a difference. “The school is terrific,” my neighbor said of her daughters’ public school, considered one of the best in the city. “But if you’re a parent there, you have to be prepared to do a lot of fundraising for basic things like textbooks.” A few years later, I noticed that I was getting the same email at the beginning of every semester from the students in my college classes: it said that the student was very sorry, but he or she could not do the homework because the course books had not yet arrived in the mail. Those students always seemed to be the students who received the lowest grades at the end of the semester. It made sense: they could not do the work required to pass the class if they did not have the books. I wondered: could book shortages be a factor in Philadelphia public schools’ consistently low stan- dardized test scores? (Many parents do not have the resources to fundraise to get books for a school—my neighbor is an outlier, as are many of the other parents at that particu- lar school.) The District currently has 131,262 students in grades pre-kindergarten through 12, 87.3 percent of whom are economically disadvantaged. This is a significant issue because even if parents at each school fundraised, they might not be able to raise enough money to buy all of the books needed. Most people would be surprised at the idea that a public school would not have enough books. After all, Pennsylvania law specifically says that the state provides books. In Philadelphia, however, students and parents regularly complain of textbook short- ages. A 10th grader at Parkway West High School told me that students often have to share books in class and cannot take them home to do homework. Many books are in poor condition: “There were pictures of testicles drawn on every page,” she said of one of her ninth-grade books. The logistical challenges of getting multiple books to hun- dreds of thousands of students at hundreds of schools overwhelm many major school districts (Labbé and Haynes 2007). 4 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 5. Access to books is particularly critical because a school today is labeled a success or failure based on students’ performance on high-stakes tests. The tests are highly spe- cific and are aligned with state educational standards. The tests are also aligned with the textbooks sold by the three educational publishers that dominate the educational publishing market. These same publishers design and grade the standardized tests. It therefore stands to reason that if students do not have the right textbooks, they will not be able to do well on the tests even if they want to. Answering the question whether a single school has enough books is complex because each student in each grade studies at least four subjects every year. Asking if there are enough books in an entire school district is a massive task. With more than 200 schools, the School District of Philadelphia is the eighth largest school district in the country. Many of the schools have high student turnover because students switch schools as they navigate the child welfare or juvenile justice systems (Department of Human Services, City of Philadelphia 2012). The Children and Youth Division of the Philadelphia Department of Human Services serves an estimated 20,000 children and their families each year (Department of Human Services, City of Philadelphia 2014). This background helped to pose what became the central research question: are enough books available for Philadelphia students to allow them to prepare adequately for state-mandated standardized tests? I designed an algorithm and a database architecture that would let me calculate the answer to my investigative question. The algorithm is designed to check whether students are provided with the materials specified in the rules of the educational system. If they are not, there is likely to be a violation, and there is probably an oppor- tunity for a story. Implementing the Prototype The Story Discovery Engine prototype launched online as a project called “Stacked Up.” It has two parts: it is both a reporting tool and a presentation system for the stories I wrote using the reporting tool. The presentation system provides the user with a set of investigative stories and some explanatory text about the project (see Figure 1). The reporting tool is a set of dynamic data visualizations that allowed me to write the investigative stories. The statistics and data that supported each story were original, derived from the data analysis resulting from the algorithm that forms the backbone of the project. In the reporting tool view, the reporter sees a page representing a single school. The page shows different types of data, organized so that specific types of investigative questions can be easily answered (see Figure 2). Some such questions include: How many students are in each grade in this school? Where is the school located in the city? How does this school’s test results compare to the rest of the district? Do there seem to be enough books for the students enrolled? The system design anticipates the data points that a reporter needs to write a data-rich story and presents them in a centralized, easy-to-navigate format. The reporter leverages their domain expertise, clicks around to adjust some what-ifs to prompt the ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 5 Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 6. FIGURE 1 Presentation system and reporting tool shown on project home page FIGURE 2 Reporting tool view 6 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 7. creative process, and comes up with a story idea. Because the story idea is targeted, it immediately becomes easier to identify appropriate sources. The key is that the software does not try to solve a problem faced by all journal- ists on every beat. It tries to solve a specific problem on a specific beat, and in the pro- cess creates a way to solve other problems on that same beat. The Story Discovery Engine prototype was created and applied to education data, but the model can easily be applied to other beats as well. A list of the rules used in the system is beyond the scope of this paper, as is a depiction of the object model used to represent relationships between the entities involved; however, additional technical details are available by request. For the sake of description, however, one of the rules could be explained as follows: Core_subjects = math, reading, social studies, science. School_curriculum = a curriculum package published by a major educational pub- lisher (e.g., “Everyday Math”). Necessary_material = the minimum books or workbooks necessary to teach the school’s curriculum package. This often means two items: a textbook and workbook. For each school in School_District For each grade in school For each Core_subject For each Necessary_material in School_curriculum If NumberOf(students_in_grade) = NumberOf(necessary_material) Then Enough_materials = yes Else Enough_materials = no. Once the prototype existed, I looked at the data analysis and interviewed people to validate the findings. I developed hypotheses, reported them out, revised the hypotheses, and considered story formats as part of a months-long process. As pre- dicted, the data revealed multiple potential stories about how books were “stacked up” in Philadelphia city schools. Theoretical Background The Story Discovery Engine draws on adjacent, occasionally overlapping concepts from the fields of communication, cognition, and computation. I will explain each in turn and how it relates to the Story Discovery Engine. These fields are not generally placed in dialogue with each other, but there are enormous productive possibilities if they are put together in conversation. Computation The Story Discovery Engine software belongs to a class of artificial intelligence programs called knowledge-based expert systems. Benfer offers an excellent definition: ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 7 Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 8. Expert systems are computer programs that perform sophisticated tasks once thought possible only for human experts. If performance were the sole criterion for labelling a program an expert system, however, many decision support systems, statistical analy- ses, and spreadsheet programs could be called expert systems. Instead, the term “expert system” is generally reserved for systems that achieve expert-level performance, using artificial intelligence programming techniques such as symbolic representation, inference, and heuristic search (Buchanan 1985). Knowledge-based systems can be dis- tinguished from other branches of artificial intelligence research by their emphasis on domain-specific knowledge, rather than more general problem-solving strategies. Because their strength derives from such domain-specific knowledge rather than more general problem-solving strategies (Feigenbaum 1977), expert systems are often called “knowledge-based.” Since the knowledge of experts tends to be domain-specific rather than general, most expert systems representing this knowledge reflect the specialized nature of such expertise. (Benfer 1991, 4) Benfer argues that expert systems can provide an important mechanism for prompting new social science thinking, and expert system developers can learn from social scien- tists’ rigorous methods of data collection and validation. He was the first to deploy an expert system in journalism: MUckraker, an expert system under development by New Directions in News and the Investigative Reporters and Editors Association at Missouri University, is a program to advise investigative reporters on how to approach people for interviews, how to prepare for those interviews, and how to examine a wide range of public documents in the con- duct of an investigation. This program is designed to act much as an expert investigative reporter might, advising the user on strategies to try when sources are reluctant to be interviewed, pointing out documents that might be relevant to the investigation, and advising the user on how to organize his or her work. (Benfer 1991, 4) Under the expert system model Benfer describes, the expert system would deliver to the reporter “advice” about whether the quantity of books in a school would be the appropriate basis for a story. The innovation in the Story Discovery Engine is that instead of advice, the expert system delivers an interactive data visualization. The data visualization is specifically designed to answer the most common questions a reporter might ask in order to assess whether a story might be found at a particular school. I decided that using the human reporter’s judgment was more efficient than a computer’s for assessing newsworthiness in this case because the system is designed to be used in the deadline-driven, time-sensitive environment of a newsroom. The notion that computer-based quantitative methods should augment humans, not replace them, is one of the principles of automated text analysis put foward by Grimmer and Stewart (2013) in their analysis of possible pitfalls in automated content analysis. In recent years, communication scholars have frequently used the human workers who participate in Amazon’s Mechanical Turk in order to code content in large data sets. In the Story Discovery Engine model, the reporter is a similarly essential part of the system (see Figure 3). Using the vast “computational” resources of the human brain, the reporter takes only moments to look at the data revealed by the system, leverage formal and informal knowledge, and make a judgment about the likelihood of a story. It would require vast 8 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 9. amounts of computing power to get the computer to draw the same conclusions; also, it would take years to tease out all of the subtleties of human news judgment and implement them computationally. The human brain thus becomes an efficient part of the story-generating process, aided and augmented by the computational system. It is significant that Benfer used social science methods in crafting an expert sys- tem for journalism. Social science thinking is at the heart of what today we call data journalism. Meyer pioneered the application of social science methods to journalism in his 1967 Pulitzer Prize-winning story about race riots in Detroit; those methods were later codified in Precision Journalism: A Reporter’s Introduction to Social Science Methods Meyer (2002). Precision journalism methods informed computer-assisted reporting, which flourished in the 1980s with the advent of desktop computers in the newsroom. Today’s online data journalists are incubated and organized by the Investigative Report- FIGURE 3 A classical expert system compared to the Story Discovery Engine ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 9 Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 10. ers and Editors Association through the National Institute for Computer-Assisted Report- ing, which offers the Phil Meyer Reporting Award for a data-driven project each year. Three other computation concepts deserve mention: open data, open source, and big data. Data journalism can only flourish if data sets are available. Structural changes in the US government have allowed data to be more freely distributed. Influenced in part by the open data movement, President Barack Obama (2009) released a memorandum declaring a new openness around data access and availability. “My Administration is committed to creating an unprecedented level of openness in Government,” it reads. Information maintained by the Federal Government is a national asset. My Administra- tion will take appropriate action, consistent with law and policy, to disclose information rapidly in forms that the public can readily find and use. Executive departments and agencies should harness new technologies to put information about their operations and decisions online and readily available to the public. (Obama 2009) The idea is that citizens can take government data and analyze it to increase transpar- ency and accountability. The Story Discovery Engine is an intentional system: its analysis is presented with the intent of increasing government accountability. It is nonpartisan software, but it proceeds from the assumption that there are problems in the social system that need to be exposed through the available data. Open data is often mentioned in conjunction with open source software tools. Stacked Up was implemented using almost exclusively open source tools. It consists of 43,000 lines of code, all of which are available on an open source version control site called GitHub. Just like the data it analyzes, the software is publicly available for anyone to peruse and fact-check. This adds an extra layer of transparency to a transparency- producing activity. It is worth mentioning at this point the relationship between software tools and reporters’ productivity. Several Web-based tools have been developed to help journal- ists be more efficient at their investigative tasks. Tabula, for example, turns PDFs into text. One of the most consistent points of conflict between reporters and officials is the way that the officials provide information. Entire books have been written about the nuances of negotiating for access to public records (Cuillier 2011; Marburger 2011). A successful tool for investigative journalism allows reporters to surmount common diffi- culties that interfere with reporting. Likewise, several data visualization tools have become popular to use on structured data. Putting census data into a data visualization tool like Tableau, which displays maps and bubble charts and other forms, allows the reporter to see patterns that would otherwise be invisible. A small but growing subset of journalists is comfortable using data to enhance their abilities to investigate stories. However, those reporters are limited to using the number of data sets that they, or their newsroom team, can manage. Analyzing one data set is usually enough for a story. Analyzing two or three data sets and turning them into a story package requires a team that includes a programmer, designer, writer, and editor (Domingo 2008; Parasie and Dagiral 2012; Royal 2010). This is where big data comes in. The next frontier in investigative reporting is using a computer to analyze multiple data sets at a time. “Big data” means many things: lots of data (meaning a large quantity of data, as in terabytes or yottaabytes) or lots of different types of data (meaning a great number of data sets) (boyd and Crawford 2012). Each is difficult in a newsroom. Newsrooms 10 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 11. tend to have minimal equipment (Domingo 2008), and it is hard to justify to an editor why a reporter would need thousands of dollars’ worth of specialized equipment to analyze terabytes of data. It is also hard to crunch a number of data sets in a news- room because it requires computer-programming expertise. Reporters have to either develop their own programming skills (which is difficult) or convince an editor to devote in-house programming expertise to the project (which is also difficult, because the few programmers in newsrooms tend to be overextended). Resource and personnel shortages are practical reasons for why big data analysis seldom happens in the newsroom (Royal 2010). A software system, properly implemented, can shortcut this long process and can make more efficient use of limited newsroom developer resources. Stacked Up analyzes 15 data sets, which is more than a typical newsroom can handle given staffing and time constraints. It took three developers six months to implement, which is more time than can usually be devoted to a news development project. However, now that the system architecture exists, the analysis can be replicated in other states or districts in a matter of days or weeks, not months. The system is based on standardized data, which (as the name suggests) does not vary significantly. This is consistent with a software design principle of “write once, run anywhere.” Any newsroom can take the software, analyze local data, and generate dozens of original investigative stories that matter to the newsroom’s specific audience. The Story Discovery Engine is a tool to improve pro- ductivity in both original investigative ideas and sources. Communication The project derives from two significant theories about the future of news. The first is the paradigm proposed by Remler, Waisanen, and Gabor (2013): that collabora- tive efforts between journalists, programmers, academics, and foundations provide opportunities for innovation. Stacked Up was created out of a partnership between a nonprofit journalism organization under the aegis of Temple University’s Center for Public Interest Journalism (CIPJ) and me, an independent journalist and academic. CPIJ founded the organization with funding from the William Penn Foundation and the Wyncote Foundation. The team also looked at best practices developed and publicized by data journalism organizations. Data teams at ProPublica, the Chicago Tribune, and the Washington Post all maintain “nerd blogs” that they use to communicate methodol- ogy behind their data projects; methodologies are also discussed on Source, a data blog maintained by the Mozilla Foundation. The other significant theoretical concept behind Stacked Up is the notion of accountability through algorithm. In “Accountability Through Algorithm: Developing the Field of Computational Journalism,” Hamilton and Turner (2009) define computational journalism (of which data journalism is a subset) as: “The combination of algorithms, data, and knowledge from the social sciences to supplement the accountability function of journalism.” They write that computational journalism has the potential to help sustain watchdog reporting because it can “hold leaders accountable, unmask malfeasance, and make visible critical social trends.” Accountability through algorithm can mean reverse-engineering an algorithm to discover how a company used an algorithm to influence the public (Diakopoulos 2013, ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 11 Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 12. 2014; Sweeney 2013) or it can mean designing an algorithm that is used to hold decision-makers accountable. I employ the latter meaning. Cognition To understand the cognitive labor-saving dimension of the Story Discovery Engine model, it is useful to consider the role of creativity in newsroom production. Reporters use what López-Ortega (2013) calls “deliberate creativity” in order to create original prose on deadline. Spontaneous creativity, or waiting for inspiration to strike, does not allow reporters to meet the demands of the job. Reporters employ a set of creative problem-solving strategies to generate ideas, create interview questions, observe events, and synthesize this information into prose that conforms to the appropriate publication style (Gans 2004; Tuchman 1978). Boden writes of the creative process: Creativity is a fundamental feature of human intelligence, and a challenge for AI [Artifi- cial Intelligence]. AI techniques can be used to create new ideas in three ways: by pro- ducing novel combinations of familiar ideas; by exploring the potential of conceptual spaces; and by making transformations that enable the generation of previously impos- sible ideas. (Boden 1998, 347) Many human beings—including (for example) most professional scientists, artists, and jazz-musicians—make a justly respected living out of exploratory creativity. That is, they inherit an accepted style of thinking from their culture, and then search it, and perhaps superficially tweak it, to explore its contents, boundaries, and potential. But human beings sometimes transform the accepted conceptual space, by altering or removing one (or more) of its dimensions, or by adding a new one. Such transformation enables ideas to be generated which (relative to that conceptual space) were previously impos- sible. The more fundamental the transformation, and/or the more fundamental the dimension that is transformed, the more different the newly-possible structures will be. (Boden 1998, 348) A computer interface can provide the “fundamental transformation” that Boden calls for: It can be said that deliberate creativity is facilitated by objective manipulation of a con- ceptual space. Also, the iterative process that triggers spontaneous creativity can be promoted by computer programs that transform repeatedly interim creations, while a creative subject judges their value. This iterative activity leads to preserve, change, combine or erase parameters as thought convenient. Therefore, computer-assisted soft- ware must facilitate both, deliberate and spontaneous creativity. To do so, cognitive processes associated to creativity, as well as their complex interplay, must be character- ized properly and then a computational solution can be proposed and implemented. (López-Ortega 2013, 3460) A computer-assistance tool to enhance creativity must possess algorithms that help computing divergent exploration. The outcome of divergent exploration must be unique ideas. In this sense, a software tool must help overcoming the inherent limits of the individual for producing divergent solutions. (López-Ortega 2013, 3461) 12 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 13. The Story Discovery Engine helps the individual overcome “inherent limits” because it analyzes more data sets than an individual could achieve alone. It tests levels of mean- ing embedded in social rules: if we have ideals of equal access to education, and if we have a public education system with standards, and if we have state-mandated assess- ments that measure how well students have met those standards, and if we have teachers who are provided with the standards, and if we grant that objects (books or other learning materials) are necessary to practice the material and concepts associated with the standards: is this an equal system? If not, do we have enough money to make it equal? If not, what do we do? The rules embedded in the expert system correspond to the rules articulated in laws and public policies. Ordinarily, only a subject matter expert would be able to render judgments about whether a scenario is within the law or not. The Story Discovery Engine makes some of these decisions for the reporter, freeing the reporter up for higher-level cognitive imaginings. Findings and Implications for Further Research I theorized that the Story Discovery Engine model could accelerate the produc- tion of ideas and stories on a public affairs beat. I prototyped the software and used it to report on a specific beat. The successful implementation of the project suggests the Story Discovery Engine model as a valid option for creating impactful news. The following were among the project’s findings: Only a handful of Philadelphia schools seem to have enough books and learn- ing materials to teach students adequately under the district’s academic guidelines. At least 10 schools appear to have no books at all, others seem to have books that are wildly out of date, and some seem to have only the books that fit the curriculum guidelines established by a chief academic officer who left the dis- trict years ago. Despite investing in custom software to track its textbook inventory, the Dis- trict did not require any of its employees to use the software. The District spent $111 million on textbooks between 2008 and 2013. Its inventory showed more than a million books. Nobody knew where they were; boxes and boxes of books lay unused and un-catalogued in the basement at District headquarters. The District published a recommended core curriculum, but did not know if any of its schools were using it. There was no systematic way to determine whether struggling schools had the books and resources they needed for stu- dent success. These findings, once published, were shared extensively on social media and prompted a number of changes at the School District of Philadelphia. Outcomes in sub- sequent weeks included: One highly paid administrator was found to be responsible for a number of textbook tracking failures. That administrator retired. An internal investigation revealed that several school principals were buying ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 13 Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 14. textbooks from sales representatives with whom they had personal relation- ships instead of buying the textbooks recommended by the central adminis- tration. Some of the reps were former school principals. This practice was eliminated and cost savings were achieved (Jessica Diaz, personal communica- tion 2013). The School District of Philadelphia closed 24 schools at the end of the 2012–2013 school year, displacing approximately 4000 students. Originally, the District planned to send all the books from the closing schools to the schools that were slated to receive the students. Instead, the District collected all of the books from the closing schools at a central location. An attempt was made to organize the books and reallocate them judiciously. An audit was performed so that the central administration was made aware of the curriculum officially in use at each school. Several local news organizations picked up the investigative stories and re-published them on their own websites, amplifying the audience for the stories. This modest impact suggests that the reporting could be duplicated in other large cities like Philadelphia, all of which struggle with similar logistical issues around public education resources. The Story Discovery Engine model also solves a particular logistical issue that newsrooms struggle with. A newsroom depends on specialized labor. The writers are good at writing, the editors are good at editing, the Web producers are good at the nuances of the content management system, and the programmers are good at writing programs. It makes sense to have the programmers write the code that teases out the facts the reporters need to write stories. Getting the reporters to write high-level code is less practical. However, few newsrooms have the staff that would be required to write high-level code (McChesney 2012; Parasie and Dagiral 2012; Royal 2010). Writing code is difficult. Royal writes that the more experience a reporter has, the more they tend to appreciate the complexity of data journalism: Experience is correlated to the perceived level of difficulty of working with data jour- nalism for journalists in general. In this case, the more experience the journalist has, the more likely he or she is to agree that data journalism is difficult for most journal- ists. This might indicate that the journalists with some or extensive data journalism experience tend to value this expertise as unique and a skill that not everyone can master. (Royal 2010) Despite the enthusiasm for data journalism, the logistics of performing data journalism have proved formidable for many news organizations. Creating a Story Discovery Engine for a metropolitan area, then opening it up to the public, allows more people to leverage the code to write stories. The engine could also be implemented by a foundation and opened up to the public; the local press could use it to write stories without having to fund the development or hire and man- age a software staff. A number of story prompts arose over the course of reporting for Stacked Up. Any of the prompts could be used as prompts to write education beat stories in any district in the United States. Some prompts include: 14 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 15. - Some schools with active home–school associations fundraise for basic school supplies like paper. Find a school that is fundraising for money for books or paper using social media. Use Stacked Up to check whether the school seems to have enough books. Explore a few scenarios: - The school may be trying something new and interesting with its curriculum, and the home–school association is trying to raise money to support it. - The school was not allocated enough money to buy books for its students. - The school was allocated enough money for books, but the money went to something else. - Additional scenarios not mentioned here. - Use Stacked Up to find a school that seems not to have books. Arrange a visit and ask to see the book storage room. Are any of the “missing” books sitting in the storage area? If so, why? - A school is known to have a one-to-one laptop program where each student receives a school-issued laptop. The school still uses printed textbooks in addition to the lap- tops, but uses fewer textbooks. What happened to the books that were in the school when the laptop program began? Were they redistributed to other students? If not, where did they go? - Every time state education standards change, every school needs to buy new books to match the new standards. When did your state last update its standards? - Who were the politicians on the committee that made the standards change? Is there anything intriguing in their campaign donations? - Districts have guidelines for how long textbooks should stay in use. Generally, a text- book lasts about five years. What happens to books after they are used for five years? Are they recycled, or is there a depository? - In Detroit, the book depository became a dumping ground (Dawsey 2008; Griffioen 2008). What is happening to old books in your city? - When schools do not have enough books, teachers often compensate by making pho- tocopies. Find a school that lacks books, and check how much they spend on photo- copies. Is this an efficient economic choice? - Some schools claim they have replaced print textbooks with digital textbooks. Digital textbooks are password-protected. People regularly lose passwords and get locked out of password-protected systems. Are kids and parents able to get to the digital text- books when they need them? - Use Stacked Up to find a school that is using social studies textbooks that are more than five or eight years old. How do they teach civics or social studies with books that do not include the name of the current US President? These 10 ideas took me about 30 minutes to generate. Each of them could prob- ably result in a series of at least three stories, plus two follow-up stories based on the school district’s reaction. That is 50 original investigative stories, an entire year’s worth of stories for a reporter writing one story a week. An interested reader will probably generate additional questions while reading the story prompts; each of those questions might produce five original investigative stories as well. The potential pool of story ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 15 Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 16. ideas could multiply if given an entire newsroom of people practiced deliberate creativity. Having a virtual fountain of story ideas is especially useful for the modern news- room, where online publishing means that reporters and editors need to “feed the beast” almost constantly. Writing only one story a week is a luxury in today’s market- place, especially at online publications where writers are urged to publish multiple sto- ries a day and editors may edit 30–40 stories a week (June 2013; Peters 2010). High-impact investigative stories can take a tremendous amount of time to con- ceive and report, a timeline that is the opposite of the current market imperative. A software tool to accelerate the investigative process can add significant value to the newsroom. NOTE 1. Books such as The Investigative Reporter’s Handbook (Houston and Investigative Reporters and Editors, Inc. 2009) offer readers a set of places to look for stories inside different beats such as education, transportation, or nonprofits. Likewise, Investigative Reporters and Editors, Inc., the nonprofit formed in 1975 to help “improve the quality of investigative reporting,” focuses significant educational efforts on strategies to help reporters find story ideas: a February 2014 electronic search of the Investigative Reporters and Editors library includes 127 tipsheets for the search query “investigative story ideas.” REFERENCES Appelgren, Ester, and Gunnar Nygren. 2014, February. “Data Journalism in Sweden: Introduc- ing New Methods and Genres of Journalism into ‘Old’ Organizations.” Digital Journal- ism: 1–12. doi:10.1080/21670811.2014.884344. Benfer, Robert Alfred. 1991. Expert Systems. Sage University Papers Series, no. 07-077. Newbury Park, Calif: Sage. http://dx.doi.org/10.4135/9781412984225. Boden, Margaret A. 1998. “Creativity and Artificial Intelligence.” Artificial Intelligence 103: 347–356. boyd, danah, and Kate Crawford. 2012. “Critical Questions for Big Data: Provocations for a Cultural, Technological and Scholarly Phenomenon.” Information, Communication Society 15 (5): 662–679. doi:10.1080/1369118X.2012.678878. Buchanan, Bruce G. 1985. “Expert systems.” Journal of Automated Reasoning 1 (1): 28–35. Cuillier, David. 2011. The Art of Access: Strategies for Acquiring Public Records. Washington, DC: CQ Press. Dawsey, Chastity Pratt. 2008. “Unsecured Schools given up to Thieves, Vandals.” Detroit Free Press, April 4. http://www.freep.com/apps/pbcs.dll/article?AID=/20080404/NEWS01/ 804040302. Department of Human Services, City of Philadelphia. 2012. 2011 Annual Report. Annual Report. http://www.phila.gov/dhs/pdfs/DHS%20Annual%20report.pdf. Department of Human Services, City of Philadelphia. 2014. “Children and Youth Division Home Page.” http://dhs.phila.gov/intranet/pgintrahome_pub.nsf/content/cydhomepage. 16 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 17. Diakopoulos, Nicholas. 2013. “Rage against the Algorithms.” The Atlantic, October 3. http:// www.theatlantic.com/technology/archive/2013/10/rage-against-the-algorithms/280255/. Diakopoulos, Nicholas. 2014. “Algorithmic Accountability Reporting: On the Investigation of Black Boxes”. Tow Center for Digital Journalism at Columbia University. http://towcen ter.org/wp-content/uploads/2014/02/78524_Tow-Center-Report-WEB-1.pdf. Diaz, Jessica. 2013. Personal Communication. Dick, Murray. 2013, September. “Interactive Infographics and News Values.” Digital Journal- ism: 1–17. doi:10.1080/21670811.2013.841368. Domingo, David. 2008. “Interactivity in the Daily Routines of Online Newsrooms: Dealing with an Uncomfortable Myth.” Journal of Computer-Mediated Communication 13 (3): 680–704. doi:10.1111/j.1083-6101.2008.00415.x. Feigenbaum, E.A. 1977. “The Art of Artificial Intelligence: Themes and Case Studies of Knowl- edge Engineering.” Proceedings UCAI 5. Cambridge, MA. Flaounas, Ilias, Omar Ali, Thomas Lansdall-Welfare, Tijl De Bie, Nick Mosdell, Justin Lewis, and Nello Cristianini. 2013. “Research Methods in the Age of Digital Journalism: Massive- Scale Automated Analysis of News-Content—Topics, Style and Gender.” Digital Journalism 1 (1): 102–116. doi:10.1080/21670811.2012.714928. Flew, Terry, Christina Spurgeon, Anna Daniel, and Adam Swift. 2012. “The Promise of Com- putational Journalism.” Journalism Practice 6 (2): 157–171. doi:10.1080/17512786.2011. 616655. Gans, Herbert J. 2004. Deciding What’s News: A Study of CBS Evening News, NBC Nightly News, Newsweek and Time / Herbert J. Gans. Visions of the American Press. Evanston, Ill: Northwestern University Press. Griffioen, James D. 2008. “The Knowledge of What Happened and What Will.” Sweet Juniper. http://www.sweet-juniper.com/2008/04/knowledge-of-what-happened-and-what.html. Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–297. doi:10.1093/pan/mps028. Hamilton, James T., and Fred Turner. 2009. Accountability through Algorithm: Developing the Field of Computational Journalism. Developing the Field of Computational Journalism. Center For Advanced Study in the Behavioral Sciences Summer Workshop: Stanford University. http://www.stanford.edu/~fturner/Hamilton%20Turner%20Acc%20by%20Alg %20Final.pdf. Hansen, Kathleen A. 1991. “Source Diversity and Newspaper Enterprise Journalism.” Journalism Mass Communication Quarterly 68 (3): 474–482. doi:10.1177/107769909106800318. Houston, Brant and Investigative Reporters and Editors, Inc. 2009. The Investigative Reporter’s Handbook: A Guide to Documents, Databases and Techniques. 5th ed. , edited by Brant Houston, Investigative Reporters and Editors, Inc. Boston, MA: Bedford/St. Martin’s. Howard, Alexander Benjamin. 2014. The Art Science of Data-Driven Journalism. Tow/Knight Reports. Tow Center for Digital Journalism: Columbia University. http://towcenter.org/ wp-content/uploads/2014/05/Tow-Center-Data-Driven-Journalism.pdf. June, Laura. 2013. “Maura Johnston on Why She Opened Her IPad-Only Magazine to the Web.” The Verge, July 10. http://www.theverge.com/2013/7/10/4506824/maura-john ston-on-why-she-opened-her-ipad-only-magazine-to-the-web. Labbé, Theola and Dion Haynes, V. 2007. “Rhee Blasts Textbook Process for Letting Supplies Languish.” The Washington Post, August 4. http://www.washingtonpost.com/wp-dyn/ content/article/2007/08/03/AR2007080302134_pf.html. ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING 17 Downloaded by [Temple University Libraries] at 08:32 15 December 2014
  • 18. López-Ortega, Omar. 2013. “Computer-Assisted Creativity: Emulation of Cognitive Processes on a Multi-Agent System.” Expert Systems with Applications 40 (9): 3459–3470. doi:10.1016/j.eswa.2012.12.054. Marburger, David. 2011. Access with Attitude: An Advocate’s Guide to Freedom of Information in Ohio. Athens: Ohio University Press. McChesney, Robert W. 2012. “Farewell to Journalism?: Time for a Rethinking.” Journalism Practice 6 (5–6): 614–626. doi:10.1080/17512786.2012.683273. Meyer, Philip. 2002. Precision Journalism: A Reporter’s Introduction to Social Science Methods. 4th ed. Lanham, Md: Rowman Littlefield. Obama, Barack. 2009. “Memorandum for the Heads of Executive Departments and Agencies Re: Transparency and Open Government”. Federal Register. http://www.whitehouse. gov/the_press_office/TransparencyandOpenGovernment. Parasie, S., and E. Dagiral. 2012. “Data-Driven Journalism and the Public Good: ‘Computer- Assisted-Reporters’ and ‘Programmer-Journalists’ in Chicago.” New Media Society 15 (6): 853–871. doi:10.1177/1461444812463345. Pavlik, John V. 2013. “Innovation and the Future of Journalism.” Digital Journalism 1 (2): 181–193. doi:10.1080/21670811.2012.756666. Peters, Jeremy W. 2010. “In a World of Online News, Burnout Starts Younger.” The New York Times, July 18. http://www.nytimes.com/2010/07/19/business/media/19press.html. Protess, David. 1991. The Journalism of Outrage: Investigative Reporting and Agenda Building in America. New York: Guilford Press. Remler, Dahlia K., Don J. Waisanen, and Andrea Gabor. 2013. “Academic Journalism: A Modest Proposal.” Journalism Studies, August, 1–17. doi:10.1080/1461670X.2013.821321. Royal, Cindy. 2010. “The Journalist as Programmer: A Case Study of the New York times Interactive News Technology Department.” The University of Texas at Austin. https:// online.journalism.utexas.edu/2010/papers/Royal10.pdf. Scribner, S., and M. Cole. 1973. “Cognitive Consequences of Formal and Informal Education: New Accommodations Are Needed between School-Based Learning and Learning Experiences of Everyday Life.” Science 182 (4112): 553–559. doi:10.1126/sci- ence.182.4112.553. Sternberg, Robert J., ed. 1999. Handbook of Creativity. Cambridge, U.K. ; New York: Cambridge University Press. Sweeney, Latanya. 2013. “Discrimination in Online Ad Delivery.” Communications of the ACM 56 (5): 44. doi:10.1145/2447976.2447990. Tuchman, Gaye. 1978. Making News: A Study in the Construction of Reality. New York: Free Press. Meredith Broussard, Department of Journalism, Temple University, USA. E-mail: merbroussard@temple.edu. Web: http://meredithbroussard.com 18 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014