CU Boulder — EVEN 2909 — Monitoring, Evaluation & Data
← Course Home

Monitoring, Evaluation & Data

EVEN 2909: Introduction to Sustainability Engineering — Week 13

University of Colorado Boulder

Why M&E Matters

“Without data, you’re just another person with an opinion.” — W. Edwards Deming

Monitoring and Evaluation (M&E) is the systematic process of collecting and analyzing data to determine whether an intervention is working, for whom, and why. It answers three fundamental questions:

Accountability
Are we doing what we promised? Are resources being used effectively? M&E provides evidence to donors, governments, and communities that investments are producing results.
Learning
What’s working and what isn’t? M&E enables adaptive management — adjusting course based on evidence rather than assumptions. Fail fast, learn faster.
Evidence-Based Decisions
Which interventions should be scaled? Which should be abandoned? Without rigorous evidence, we risk scaling things that don’t work and missing things that do.

For sustainability engineers: Good intentions are not enough. Your designs must be validated with data. M&E is how you prove (or disprove) that your technology actually creates the impact you intend.

Theory of Change

A Theory of Change (ToC) is a roadmap that explains how and why you expect your intervention to produce a desired outcome. It makes your assumptions explicit and testable.

Problem
What needs to change?
Intervention
What are you doing?
Mechanism
How does it work?
Outcome
What changes?
Impact
Why does it matter?

If-Then Logic

IF we install chlorine dosing systems at rural water points, AND communities are trained to maintain them, THEN water quality will improve at the point of collection, WHICH LEADS TO reduced waterborne disease, BECAUSE pathogens are eliminated before consumption.

Critical Assumptions

  • Each “if → then” link relies on assumptions that may or may not hold
  • Example assumptions: communities will accept chlorinated water; supply chains for chlorine exist; dosing systems will be maintained
  • The ToC forces you to identify these assumptions before you start, not after you fail
  • M&E then tests whether each assumption actually holds in practice

Sources: Weiss, Evaluation 1997; USAID Theory of Change guidance; Taplin et al., 2013

The Logframe

The Logical Framework (logframe) translates a Theory of Change into a structured table that connects activities to impact through a chain of results.

Level Description Indicator Data Source
Impact Reduced child mortality from diarrheal disease Under-5 diarrhea mortality rate National health statistics
Outcome Households consume safe drinking water % of households with <1 CFU/100mL E. coli Household water quality testing
Output Water treatment systems installed and functional # of systems installed; % functional at 6 months Field monitoring reports; sensor data
Activity Train technicians; procure materials; install systems # technicians trained; # systems procured Training records; procurement logs
Input Funding, staff, equipment, partnerships Budget spent; staff FTEs Financial reports

The logframe gap: Most projects are good at tracking inputs and activities (we spent the money, we built the thing). Far fewer track outcomes and impact (did it actually improve lives?). That gap is where M&E adds the most value.

Sources: USAID Logframe guidance; World Bank Operations Manual

Choosing Indicators

Indicators are the specific, measurable signals that tell you whether your intervention is on track. Good indicators follow the SMART criteria:

S
Specific
Clear and unambiguous
M
Measurable
Quantifiable or observable
A
Achievable
Realistic given resources
R
Relevant
Connected to the outcome
T
Time-bound
Has a deadline

Quantitative vs. Qualitative

  • Quantitative: numbers — liters of water treated, tonnes of CO₂ reduced, % of systems functional
  • Qualitative: perceptions, stories, context — user satisfaction, community attitudes, barriers to adoption
  • Both are essential. Numbers tell you what; stories tell you why.

Proxy Indicators

  • Sometimes you can’t measure the outcome directly
  • Example: can’t measure diarrhea reduction directly? Use water quality at point of consumption as a proxy
  • Example: can’t measure deforestation in real time? Use satellite-derived tree cover change as a proxy
  • Proxies must be validated — does the proxy actually correlate with the outcome?

Sources: UNDP M&E Handbook; Bamberger et al., RealWorld Evaluation 2012

Data Collection Methods

Surveys & Interviews
Structured questionnaires for quantitative data; semi-structured interviews for depth. Mobile data collection (KoboToolbox, ODK) has transformed field surveys. Key challenges: recall bias, social desirability bias, sampling.
Sensors & IoT
Automated, continuous, objective data. Water flow meters, air quality monitors, energy meters, soil moisture probes. Eliminates recall bias and enables real-time monitoring. Challenges: cost, maintenance, connectivity, data management.
Remote Sensing
Satellite imagery for land use, deforestation, crop health, urban growth, water body extent. Drone imagery for site-level monitoring. Advantages: broad coverage, historical baselines, objective. Limitations: cloud cover, resolution, ground-truthing needed.
Administrative Data
Existing records: health facility visits, school enrollment, utility billing, government registries. Low cost but often incomplete, inconsistent, or delayed. Increasingly digitized and linkable.
Participatory Methods
Focus groups, community mapping, photovoice, most significant change stories. Centers community voices and captures context that surveys miss. Essential for understanding why something works or doesn’t.

Mixed methods: The strongest M&E combines quantitative data (what and how much) with qualitative data (why and how). Neither alone tells the full story.

Sources: Patton, Qualitative Research & Evaluation Methods 2014; J-PAL Research Resources

Impact Evaluation Methods

The fundamental question of impact evaluation: What would have happened without the intervention? This is the counterfactual — and it’s the hardest thing in evaluation to establish.

Gold Standard: RCTs

Randomized Controlled Trials
Randomly assign treatment and control groups. Any difference in outcomes can be attributed to the intervention (not to pre-existing differences). The 2019 Nobel Prize in Economics went to Banerjee, Duflo, and Kremer for pioneering RCTs in development.
  • Strengths: strongest causal evidence; eliminates selection bias
  • Limitations: expensive, slow, ethical concerns (withholding treatment), may not generalize to other contexts

Quasi-Experimental Methods

When Randomization Isn’t Possible
Use statistical techniques to approximate a counterfactual from observational data. Less rigorous than RCTs but often more feasible and ethical.
  • Difference-in-differences (DiD): compare changes over time between treatment and comparison groups
  • Regression discontinuity: exploit a cutoff (e.g., eligibility threshold) to compare those just above and below
  • Propensity score matching: statistically match treated and untreated units on observable characteristics
  • Before-after: simplest but weakest — no control for external changes

Sources: Gertler et al., Impact Evaluation in Practice (World Bank) 2016; J-PAL Handbook

The Replication Crisis

Science is built on reproducibility — but a growing body of evidence shows that many published findings cannot be replicated. This has profound implications for evidence-based sustainability.

50–70%
of studies fail to replicate
p < 0.05
An arbitrary threshold
$28B
Wasted on irreproducible research (US/yr)

Why It Happens

  • P-hacking: running multiple analyses until you find p < 0.05; selectively reporting “significant” results
  • Publication bias: journals overwhelmingly publish positive results; negative findings go in the “file drawer”
  • Small sample sizes: underpowered studies produce noisy, unreliable estimates
  • HARKing: Hypothesizing After Results are Known — presenting exploratory findings as confirmatory
  • Perverse incentives: careers rewarded for novel, positive, “significant” findings rather than rigorous replication

What to Do About It

  • Pre-registration: publicly declare hypotheses and methods before collecting data
  • Open data & open code: let others verify your analysis
  • Effect sizes over p-values: how big is the effect, not just “is it significant?”
  • Systematic reviews & meta-analyses: synthesize evidence across many studies

Sources: Ioannidis, PLoS Medicine 2005; Open Science Collaboration, Science 2015

Digital MRV

Measurement, Reporting, and Verification (MRV) is the backbone of carbon markets, climate finance, and environmental compliance. Digital tools are transforming MRV from periodic, manual auditing to continuous, automated monitoring.

IoT Sensors
Real-time monitoring of water quality, energy consumption, methane emissions, and land use. Sensors transmit data automatically, eliminating manual data collection. Example: Virridy’s Lume sensor monitors water system usage in real time across thousands of sites.
Satellite + AI
Machine learning applied to satellite imagery can detect deforestation, measure crop yields, track methane plumes, and verify reforestation projects at scale. Companies like Pachama and Chloris use this for forest carbon verification.
Blockchain & Transparency
Immutable ledgers for carbon credit transactions. Prevents double-counting. Smart contracts automate credit issuance when sensor data confirms emissions reductions. Still early-stage but growing rapidly.

Connection to carbon markets: The credibility crisis in carbon markets (recall Week 12) stems partly from weak MRV. If you can’t accurately measure and verify emissions reductions, credits have no integrity. Digital MRV is the technological solution to this trust deficit.

Sources: World Bank Digital MRV report; Gold Standard Digital MRV framework; Virridy

Geospatial Tools

Geographic information systems (GIS) and remote sensing have democratized access to spatial data, enabling sustainability analysis at scales from local to global.

GIS & Mapping
Geographic Information Systems overlay spatial data layers: population density, water resources, land use, infrastructure, climate risk. QGIS (free, open-source) and ArcGIS (industry standard). Essential for site selection, resource planning, and equity analysis.
Google Earth Engine
Free cloud-based platform with petabytes of satellite imagery and geospatial datasets. Enables planetary-scale analysis without downloading data. Used for deforestation monitoring, flood mapping, urban expansion, crop yield estimation, and more.
Our World in Data
Free, open-access data visualization platform covering sustainability topics: emissions, energy, food, health, poverty. Interactive charts with downloadable data. An essential resource for evidence-based arguments and teaching.

Other Key Platforms

  • NASA SEDAC: Socioeconomic Data and Applications Center — population, hazards, land use
  • Global Forest Watch: near-real-time deforestation alerts from satellite data
  • Climate TRACE: independent GHG emissions tracking for every country and major facility
  • OpenStreetMap: crowdsourced global mapping — critical in data-poor regions

Sources: Google Earth Engine; Our World in Data; NASA SEDAC; Global Forest Watch

Data Visualization

Visualization is how data becomes knowledge. Good charts inform; bad charts mislead. As engineers, your ability to communicate data visually is as important as your ability to collect it.

Tufte’s Principles

  • Data-ink ratio: maximize the ink devoted to data; minimize non-data ink (gridlines, boxes, decoration)
  • Chartjunk: 3D effects, unnecessary legends, gradient fills, and clip art distract from the data
  • Small multiples: repeat the same chart structure across categories for easy comparison
  • Show the data: don’t hide individual points behind summary statistics

Common Pitfalls

  • Truncated y-axes: starting at non-zero exaggerates differences
  • Dual y-axes: almost always misleading — implies correlation where none exists
  • Pie charts: humans are bad at comparing angles; use bar charts instead
  • Cherry-picked time frames: selecting start/end dates to support a narrative

Charts That Inform

Best Practices
Label axes clearly with units. Include data sources. Use colorblind-friendly palettes. Title charts with the takeaway, not the topic (“Solar costs fell 90% since 2010” not “Solar costs over time”). Annotate key inflection points. Let the reader see the pattern.

Tools

  • Python: matplotlib, seaborn, plotly (interactive)
  • R: ggplot2 (grammar of graphics)
  • JavaScript: D3.js, Chart.js, Observable
  • No-code: Datawrapper, Flourish, Google Sheets

Sources: Tufte, The Visual Display of Quantitative Information 2001; Schwabish, Better Data Visualizations 2021

Data Ethics

Collecting data about people creates power dynamics and responsibilities. Ethical M&E requires centering the rights and dignity of the people whose data you collect.

Informed Consent
Participants must understand what data is being collected, how it will be used, who will see it, and that they can refuse without consequences. Consent must be truly voluntary — not coerced by power imbalances (e.g., a community dependent on your organization for services).
Data Privacy & Security
Personal data must be anonymized or pseudonymized. GPS coordinates of vulnerable communities can put people at risk. Health data, income data, and behavioral data require special protections. GDPR (EU) and similar regulations set legal minimums, but ethical standards should exceed them.
Community Ownership of Data
Who owns the data: the funder, the researcher, or the community? Indigenous Data Sovereignty (CARE principles) asserts that communities should control data about them. Data should benefit the community, not just extract information from it.

The extractive pattern: Too often, researchers collect data from communities, publish papers, advance their careers, and never return results to the people who provided them. Ethical M&E requires reciprocity.

Sources: CARE Principles for Indigenous Data Governance; Belmont Report; GDPR

Case Study: Virridy’s Rwanda Program

A real-world example of how rigorous M&E transformed a water technology intervention from an assumption into peer-reviewed evidence.

The Intervention

  • Virridy (formerly SweetSense) deployed IoT sensors on community water systems across rural Rwanda
  • Sensors monitored water flow in real time, detecting breakdowns within hours instead of weeks
  • Rapid repair response reduced downtime from weeks or months to days
  • Paired with a randomized controlled trial to measure health impact

The Theory of Change

IF sensors enable rapid detection of water system failures, AND repair teams respond quickly, THEN communities will have more consistent access to safe water, WHICH LEADS TO reduced waterborne disease.

The Evidence

Published in The Lancet (Thomas et al.)
The RCT demonstrated a 29% reduction in diarrheal disease in children under 5 in communities with sensor-monitored water systems compared to control communities. This was one of the first studies to show that monitoring infrastructure functionality — not just building it — is what drives health outcomes.

Key M&E Lessons

  • Outputs are not outcomes: installing a water system (output) does not guarantee safe water access (outcome)
  • Sensors enabled the counterfactual: continuous data showed exactly when systems were working vs. broken
  • RCT design was critical: without randomization, the 29% reduction could not be causally attributed to the intervention
  • Data created accountability: real-time dashboards motivated faster repair response

Sources: Thomas et al., The Lancet Global Health; Virridy (virridy.com)

Application: Your Theory of Change

For your course project, develop a Theory of Change that maps how your proposed intervention leads to the sustainability impact you claim.

Step 1: Define the Problem
What specific sustainability challenge are you addressing? Who is affected? What is the current situation? Be specific — not “climate change” but “high energy consumption in CU residence halls due to outdated HVAC systems.”
Step 2: Describe Your Intervention
What exactly are you proposing? What activities will you undertake? What resources do you need? Be concrete enough that someone else could implement it.
Step 3: Map the Causal Chain
IF [your intervention] THEN [immediate output] WHICH LEADS TO [outcome] BECAUSE [mechanism]. Identify every assumption in the chain. Which are strongest? Which are most uncertain?
Step 4: Choose Indicators
For each level (output, outcome, impact), define a SMART indicator. How would you measure it? What data would you need? What’s your baseline?
Step 5: Design Your Evaluation
What is your counterfactual? How would you know if the observed change was caused by your intervention and not something else? Even a simple before-after comparison with a control group is better than nothing.

Key Takeaways

1. Good intentions require good evidence
Many well-meaning sustainability interventions fail or have unintended consequences. M&E is how you tell the difference between what works and what feels good.
2. A Theory of Change makes your logic testable
By making assumptions explicit, you can identify the weakest links in your causal chain before investing millions of dollars.
3. Digital tools are transforming M&E
IoT sensors, satellite imagery, and AI enable continuous, objective, real-time monitoring at scales previously impossible. This is the future of accountability.
4. Beware of bad evidence
The replication crisis shows that even published, peer-reviewed research can be wrong. Look for pre-registered studies, large samples, and independent replication.
5. Data is power — use it ethically
Informed consent, data privacy, and community ownership are not optional. The people whose data you collect should benefit from it.

Further Reading

  • Gertler, P. et al. (2016). Impact Evaluation in Practice. World Bank. Free download
  • Banerjee, A. & Duflo, E. (2011). Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty.
  • Tufte, E. (2001). The Visual Display of Quantitative Information. 2nd edition.
  • Our World in Data: ourworldindata.org
  • Ioannidis, J. (2005). “Why Most Published Research Findings Are False.” PLoS Medicine, 2(8).
  • Thomas, E. et al. “Monitoring and evaluation of water systems in Rwanda.” The Lancet Global Health.

Assignment: Draft a 1-page Theory of Change for your course project. Include: problem statement, intervention description, causal chain with assumptions, and at least 3 SMART indicators at different levels of the logframe.