Monitoring, Evaluation & Data

EVEN 2909: Introduction to Sustainability Engineering — Week 13

University of Colorado Boulder

Why M&E Matters

“Without data, you’re just another person with an opinion.” — W. Edwards Deming

Monitoring and Evaluation (M&E) is the systematic process of collecting and analyzing data to determine whether an intervention is working, for whom, and why. It answers three fundamental questions:

Accountability

Are we doing what we promised? Are resources being used effectively? M&E provides evidence to donors, governments, and communities that investments are producing results.

Learning

What’s working and what isn’t? M&E enables adaptive management — adjusting course based on evidence rather than assumptions. Fail fast, learn faster.

Evidence-Based Decisions

Which interventions should be scaled? Which should be abandoned? Without rigorous evidence, we risk scaling things that don’t work and missing things that do.

For sustainability engineers: Good intentions are not enough. Your designs must be validated with data. M&E is how you prove (or disprove) that your technology actually creates the impact you intend.

Theory of Change

A Theory of Change (ToC) is a roadmap that explains how and why you expect your intervention to produce a desired outcome. It makes your assumptions explicit and testable.

Problem
What needs to change?

→

Intervention
What are you doing?

→

Mechanism
How does it work?

→

Outcome
What changes?

→

Impact
Why does it matter?

If-Then Logic

IF we install chlorine dosing systems at rural water points, AND communities are trained to maintain them, THEN water quality will improve at the point of collection, WHICH LEADS TO reduced waterborne disease, BECAUSE pathogens are eliminated before consumption.

Critical Assumptions

Each “if → then” link relies on assumptions that may or may not hold
Example assumptions: communities will accept chlorinated water; supply chains for chlorine exist; dosing systems will be maintained
The ToC forces you to identify these assumptions before you start, not after you fail
M&E then tests whether each assumption actually holds in practice

Sources: Weiss, Evaluation 1997; USAID Theory of Change guidance; Taplin et al., 2013

The Logframe

The Logical Framework (logframe) translates a Theory of Change into a structured table that connects activities to impact through a chain of results.

Level	Description	Indicator	Data Source
Impact	Reduced child mortality from diarrheal disease	Under-5 diarrhea mortality rate	National health statistics
Outcome	Households consume safe drinking water	% of households with <1 CFU/100mL E. coli	Household water quality testing
Output	Water treatment systems installed and functional	# of systems installed; % functional at 6 months	Field monitoring reports; sensor data
Activity	Train technicians; procure materials; install systems	# technicians trained; # systems procured	Training records; procurement logs
Input	Funding, staff, equipment, partnerships	Budget spent; staff FTEs	Financial reports

The logframe gap: Most projects are good at tracking inputs and activities (we spent the money, we built the thing). Far fewer track outcomes and impact (did it actually improve lives?). That gap is where M&E adds the most value.

Sources: USAID Logframe guidance; World Bank Operations Manual

Choosing Indicators

Indicators are the specific, measurable signals that tell you whether your intervention is on track. Good indicators follow the SMART criteria:

S

Specific

Clear and unambiguous

M

Measurable

Quantifiable or observable

A

Achievable

Realistic given resources

R

Relevant

Connected to the outcome

T

Time-bound

Has a deadline

Quantitative vs. Qualitative

Quantitative: numbers — liters of water treated, tonnes of CO₂ reduced, % of systems functional
Qualitative: perceptions, stories, context — user satisfaction, community attitudes, barriers to adoption
Both are essential. Numbers tell you what; stories tell you why.

Proxy Indicators

Sometimes you can’t measure the outcome directly
Example: can’t measure diarrhea reduction directly? Use water quality at point of consumption as a proxy
Example: can’t measure deforestation in real time? Use satellite-derived tree cover change as a proxy
Proxies must be validated — does the proxy actually correlate with the outcome?

Sources: UNDP M&E Handbook; Bamberger et al., RealWorld Evaluation 2012

Data Collection Methods

Surveys & Interviews

Structured questionnaires for quantitative data; semi-structured interviews for depth. Mobile data collection (KoboToolbox, ODK) has transformed field surveys. Key challenges: recall bias, social desirability bias, sampling.

Sensors & IoT

Automated, continuous, objective data. Water flow meters, air quality monitors, energy meters, soil moisture probes. Eliminates recall bias and enables real-time monitoring. Challenges: cost, maintenance, connectivity, data management.

Remote Sensing

Satellite imagery for land use, deforestation, crop health, urban growth, water body extent. Drone imagery for site-level monitoring. Advantages: broad coverage, historical baselines, objective. Limitations: cloud cover, resolution, ground-truthing needed.

Administrative Data

Existing records: health facility visits, school enrollment, utility billing, government registries. Low cost but often incomplete, inconsistent, or delayed. Increasingly digitized and linkable.

Participatory Methods

Focus groups, community mapping, photovoice, most significant change stories. Centers community voices and captures context that surveys miss. Essential for understanding why something works or doesn’t.

Mixed methods: The strongest M&E combines quantitative data (what and how much) with qualitative data (why and how). Neither alone tells the full story.

Sources: Patton, Qualitative Research & Evaluation Methods 2014; J-PAL Research Resources

Impact Evaluation Methods

The fundamental question of impact evaluation: What would have happened without the intervention? This is the counterfactual — and it’s the hardest thing in evaluation to establish.

Gold Standard: RCTs

Randomized Controlled Trials

Randomly assign treatment and control groups. Any difference in outcomes can be attributed to the intervention (not to pre-existing differences). The 2019 Nobel Prize in Economics went to Banerjee, Duflo, and Kremer for pioneering RCTs in development.

Strengths: strongest causal evidence; eliminates selection bias
Limitations: expensive, slow, ethical concerns (withholding treatment), may not generalize to other contexts

Quasi-Experimental Methods

When Randomization Isn’t Possible

Use statistical techniques to approximate a counterfactual from observational data. Less rigorous than RCTs but often more feasible and ethical.

Difference-in-differences (DiD): compare changes over time between treatment and comparison groups
Regression discontinuity: exploit a cutoff (e.g., eligibility threshold) to compare those just above and below
Propensity score matching: statistically match treated and untreated units on observable characteristics
Before-after: simplest but weakest — no control for external changes

Sources: Gertler et al., Impact Evaluation in Practice (World Bank) 2016; J-PAL Handbook

The Replication Crisis

Science is built on reproducibility — but a growing body of evidence shows that many published findings cannot be replicated. This has profound implications for evidence-based sustainability.

Why It Happens

P-hacking: running multiple analyses until you find p < 0.05; selectively reporting “significant” results
Publication bias: journals overwhelmingly publish positive results; negative findings go in the “file drawer”
Small sample sizes: underpowered studies produce noisy, unreliable estimates
HARKing: Hypothesizing After Results are Known — presenting exploratory findings as confirmatory
Perverse incentives: careers rewarded for novel, positive, “significant” findings rather than rigorous replication

What to Do About It

Pre-registration: publicly declare hypotheses and methods before collecting data
Open data & open code: let others verify your analysis
Effect sizes over p-values: how big is the effect, not just “is it significant?”
Systematic reviews & meta-analyses: synthesize evidence across many studies

Sources: Ioannidis, PLoS Medicine 2005; Open Science Collaboration, Science 2015

Digital MRV

Measurement, Reporting, and Verification (MRV) is the backbone of carbon markets, climate finance, and environmental compliance. Digital tools are transforming MRV from periodic, manual auditing to continuous, automated monitoring.

IoT Sensors

Real-time monitoring of water quality, energy consumption, methane emissions, and land use. Sensors transmit data automatically, eliminating manual data collection. Example: Virridy’s Lume sensor monitors water system usage in real time across thousands of sites.

Satellite + AI

Machine learning applied to satellite imagery can detect deforestation, measure crop yields, track methane plumes, and verify reforestation projects at scale. Companies like Pachama and Chloris use this for forest carbon verification.

Blockchain & Transparency

Immutable ledgers for carbon credit transactions. Prevents double-counting. Smart contracts automate credit issuance when sensor data confirms emissions reductions. Still early-stage but growing rapidly.

Connection to carbon markets: The credibility crisis in carbon markets (recall Week 12) stems partly from weak MRV. If you can’t accurately measure and verify emissions reductions, credits have no integrity. Digital MRV is the technological solution to this trust deficit.

Sources: World Bank Digital MRV report; Gold Standard Digital MRV framework; Virridy

Geospatial Tools

Geographic information systems (GIS) and remote sensing have democratized access to spatial data, enabling sustainability analysis at scales from local to global.

GIS & Mapping

Geographic Information Systems overlay spatial data layers: population density, water resources, land use, infrastructure, climate risk. QGIS (free, open-source) and ArcGIS (industry standard). Essential for site selection, resource planning, and equity analysis.

Google Earth Engine

Free cloud-based platform with petabytes of satellite imagery and geospatial datasets. Enables planetary-scale analysis without downloading data. Used for deforestation monitoring, flood mapping, urban expansion, crop yield estimation, and more.

Our World in Data

Free, open-access data visualization platform covering sustainability topics: emissions, energy, food, health, poverty. Interactive charts with downloadable data. An essential resource for evidence-based arguments and teaching.

Other Key Platforms

NASA SEDAC: Socioeconomic Data and Applications Center — population, hazards, land use
Global Forest Watch: near-real-time deforestation alerts from satellite data
Climate TRACE: independent GHG emissions tracking for every country and major facility
OpenStreetMap: crowdsourced global mapping — critical in data-poor regions

Sources: Google Earth Engine; Our World in Data; NASA SEDAC; Global Forest Watch

Data Visualization

Visualization is how data becomes knowledge. Good charts inform; bad charts mislead. As engineers, your ability to communicate data visually is as important as your ability to collect it.

Tufte’s Principles

Data-ink ratio: maximize the ink devoted to data; minimize non-data ink (gridlines, boxes, decoration)
Chartjunk: 3D effects, unnecessary legends, gradient fills, and clip art distract from the data
Small multiples: repeat the same chart structure across categories for easy comparison
Show the data: don’t hide individual points behind summary statistics

Common Pitfalls

Truncated y-axes: starting at non-zero exaggerates differences
Dual y-axes: almost always misleading — implies correlation where none exists
Pie charts: humans are bad at comparing angles; use bar charts instead
Cherry-picked time frames: selecting start/end dates to support a narrative

Charts That Inform

Best Practices

Label axes clearly with units. Include data sources. Use colorblind-friendly palettes. Title charts with the takeaway, not the topic (“Solar costs fell 90% since 2010” not “Solar costs over time”). Annotate key inflection points. Let the reader see the pattern.

Tools

Python: matplotlib, seaborn, plotly (interactive)
R: ggplot2 (grammar of graphics)
JavaScript: D3.js, Chart.js, Observable
No-code: Datawrapper, Flourish, Google Sheets

Sources: Tufte, The Visual Display of Quantitative Information 2001; Schwabish, Better Data Visualizations 2021

Data Ethics

Collecting data about people creates power dynamics and responsibilities. Ethical M&E requires centering the rights and dignity of the people whose data you collect.

Informed Consent

Participants must understand what data is being collected, how it will be used, who will see it, and that they can refuse without consequences. Consent must be truly voluntary — not coerced by power imbalances (e.g., a community dependent on your organization for services).

Data Privacy & Security

Personal data must be anonymized or pseudonymized. GPS coordinates of vulnerable communities can put people at risk. Health data, income data, and behavioral data require special protections. GDPR (EU) and similar regulations set legal minimums, but ethical standards should exceed them.

Community Ownership of Data

Who owns the data: the funder, the researcher, or the community? Indigenous Data Sovereignty (CARE principles) asserts that communities should control data about them. Data should benefit the community, not just extract information from it.

The extractive pattern: Too often, researchers collect data from communities, publish papers, advance their careers, and never return results to the people who provided them. Ethical M&E requires reciprocity.

Sources: CARE Principles for Indigenous Data Governance; Belmont Report; GDPR

Case Study: Virridy’s Rwanda Program

A real-world example of how rigorous M&E transformed a water technology intervention from an assumption into peer-reviewed evidence.

The Intervention

Virridy (formerly SweetSense) deployed IoT sensors on community water systems across rural Rwanda
Sensors monitored water flow in real time, detecting breakdowns within hours instead of weeks
Rapid repair response reduced downtime from weeks or months to days
Paired with a randomized controlled trial to measure health impact

The Theory of Change

IF sensors enable rapid detection of water system failures, AND repair teams respond quickly, THEN communities will have more consistent access to safe water, WHICH LEADS TO reduced waterborne disease.

The Evidence

Published in The Lancet (Thomas et al.)

The RCT demonstrated a 29% reduction in diarrheal disease in children under 5 in communities with sensor-monitored water systems compared to control communities. This was one of the first studies to show that monitoring infrastructure functionality — not just building it — is what drives health outcomes.

Key M&E Lessons

Outputs are not outcomes: installing a water system (output) does not guarantee safe water access (outcome)
Sensors enabled the counterfactual: continuous data showed exactly when systems were working vs. broken
RCT design was critical: without randomization, the 29% reduction could not be causally attributed to the intervention
Data created accountability: real-time dashboards motivated faster repair response

Sources: Thomas et al., The Lancet Global Health; Virridy (virridy.com)

Application: Your Theory of Change

For your course project, develop a Theory of Change that maps how your proposed intervention leads to the sustainability impact you claim.

Step 1: Define the Problem

What specific sustainability challenge are you addressing? Who is affected? What is the current situation? Be specific — not “climate change” but “high energy consumption in CU residence halls due to outdated HVAC systems.”

Step 2: Describe Your Intervention

What exactly are you proposing? What activities will you undertake? What resources do you need? Be concrete enough that someone else could implement it.

Step 3: Map the Causal Chain

IF [your intervention] THEN [immediate output] WHICH LEADS TO [outcome] BECAUSE [mechanism]. Identify every assumption in the chain. Which are strongest? Which are most uncertain?

Step 4: Choose Indicators

For each level (output, outcome, impact), define a SMART indicator. How would you measure it? What data would you need? What’s your baseline?

Step 5: Design Your Evaluation

What is your counterfactual? How would you know if the observed change was caused by your intervention and not something else? Even a simple before-after comparison with a control group is better than nothing.

Key Takeaways

1. Good intentions require good evidence

Many well-meaning sustainability interventions fail or have unintended consequences. M&E is how you tell the difference between what works and what feels good.

2. A Theory of Change makes your logic testable

By making assumptions explicit, you can identify the weakest links in your causal chain before investing millions of dollars.

3. Digital tools are transforming M&E

IoT sensors, satellite imagery, and AI enable continuous, objective, real-time monitoring at scales previously impossible. This is the future of accountability.

4. Beware of bad evidence

The replication crisis shows that even published, peer-reviewed research can be wrong. Look for pre-registered studies, large samples, and independent replication.

5. Data is power — use it ethically

Informed consent, data privacy, and community ownership are not optional. The people whose data you collect should benefit from it.

Monitoring, Evaluation & Data

Why M&E Matters

Theory of Change

If-Then Logic

Critical Assumptions

The Logframe

Choosing Indicators

Quantitative vs. Qualitative

Proxy Indicators

Data Collection Methods

Impact Evaluation Methods

Gold Standard: RCTs

Quasi-Experimental Methods

The Replication Crisis

Why It Happens

What to Do About It

Digital MRV

Geospatial Tools

Other Key Platforms

Data Visualization

Tufte’s Principles

Common Pitfalls

Charts That Inform

Tools

Data Ethics

Case Study: Virridy’s Rwanda Program

The Intervention

The Theory of Change

The Evidence

Key M&E Lessons

Application: Your Theory of Change

Key Takeaways

Further Reading