What a Viral Post Taught Me About Real-World Data
- nyuimclegacy
- May 6
- 4 min read
Student experience submission by Nihareeka Mhatre - NYU SPS MS Integrated Marketing Communications
Before grad school, I thought I understood virality.
It felt like a mix of timing, creativity, and a little algorithmic luck. Maybe a trending audio clip or a cheeky caption. Maybe the right influencer. I had seen content blow up, disappear, spark discourse, or launch products—but I’d never had to break it down, line by line, into something explainable. Until this semester.
As part of our final project for Database Modeling and Management at NYU SPS, my team set out to do exactly that: decode virality across platforms using real data. The goal was to figure out what makes a post go viral on TikTok, Instagram, YouTube, or Twitter—not based on hunches, but on patterns. We were given over 4,800 high-performing social media posts to analyze, with metrics ranging from likes and comments to shares and views. It sounded like a dream project. And it was. But not for the reasons I expected.

When the Data Doesn’t Cooperate
Here’s what they don’t tell you in the beginning: real-world data has a personality. It’s messy. Inconsistent. Full of surprises. Our dataset was no exception.
We tried to run standard regression models to predict engagement metrics. We applied classification trees, ran ANOVA tests, and checked every box in the statistics playbook. But most of it came up inconclusive. Platform and content category—our main predictors—didn’t explain much. The R² scores were laughably low. We found ourselves with dozens of models and almost no strong signals.
At first, it was frustrating. Had we done something wrong? Was the data broken?
Turns out, it was something much more humbling: the complexity of human behavior doesn’t always show up neatly in a spreadsheet. People don’t like, share, or comment based on tidy categories. They respond emotionally. Contextually. Platform culture, content tone, even the time of day can shape whether a post gets buried or goes viral. And these layers weren’t in our dataset—yet they mattered the most.
Reframing the Question
Halfway through the project, we shifted our approach.
Instead of asking how much people engaged, we asked how they engaged. We engineered new metrics—like likes-to-shares and comments-to-likes—to capture emotional interaction patterns. Suddenly, the results started to feel real. TikTok showed bursts of admiration but low conversation. YouTube, on the other hand, sparked deeper reflections. Twitter was built for sharing and discourse. Instagram was steady, visual, and trust-driven.
It wasn’t about “which platform is best” anymore. It became about understanding the behavioral logic of each space—and how brands can design for that logic.
This reframing wasn’t just a methodological shift. It was a mindset shift. And it taught me that the best marketing insights often come from the moments when the data resists you—when you have to pause and ask, "What are we really trying to understand here?"

Lessons Beyond the Project
The project itself taught me a lot: how to work with regression diagnostics, how to clean and recode data, how to build a CHAID tree, and when to walk away from a model that’s not serving the question. But what stuck with me even more were the softer lessons.
Working in a group of peers from different backgrounds meant negotiating styles, merging ideas, and building something coherent together. Presenting to our professor and classmates meant defending our logic, admitting gaps, and fielding questions that pushed our thinking. It wasn’t always easy—but it felt real. Like a glimpse into the kind of work we’ll do beyond the classroom.
It also reminded me why I chose NYU SPS. The Integrated Marketing program isn’t just about tools or trends—it’s about developing the instincts to ask better questions, adapt your approach, and bring clarity to messy, data-rich challenges. That’s what the industry needs. And that’s what this program trains us to do.
If I Had to Do It Again…
If I could go back, I’d make two changes: first, I’d push us to switch our models earlier. Our diagnostics clearly showed that linear regression wasn’t working, but we stuck with it for too long. In hindsight, methods like random forests or quantile regression might’ve captured the chaotic, uneven nature of virality better.
Second, I’d advocate for more context-rich data. Variables like time of post, creator identity, sentiment, or trending audio weren’t available in our set—but they could’ve made all the difference. Real-world marketing data isn’t just about what’s easy to measure. It’s about what actually drives behavior.
Final Thoughts
In the end, what we built wasn’t a perfect model—but it was a meaningful one. It told a story about how people interact online, and what brands need to consider if they want to connect authentically. For a class project, that feels like a win.
Sometimes, a viral post can spark joy. Sometimes, it can spark a conversation. For me, this one sparked a deeper understanding of data, storytelling, and the emotional logic behind digital behavior.
And that’s a kind of virality I’ll take with me far beyond this semester.




Comments