This document discusses how a company modernized their metadata systems with New Relic. Their legacy architecture was monolithic and difficult to scale. With New Relic, they were able to rebuild their content pipeline using microservices that are easy to scale. New Relic APM provided monitoring of their new systems. New Relic Insights allowed them to analyze performance and compare their new and legacy classification engines, providing business partners with dynamic dashboards instead of static spreadsheets. Insights also gave near real-time visibility into system performance.
1. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Bill Sammons
Head of Content Enrichm ent
August 9, 2016
2. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
C r e a t i n g M o d e r n M e t a d a t a S y s t e m s w i t h N e w
R e l i c
CREATING MODERN METADATA SYSTEMS
Technology Stack Transformation
New Relic Insights for Classification Engine Transition
3. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Content Pipeline Rebuild
• Ingestion & Enrichment Pipeline
• Legacy Architecture
– Mainly Centralized Functionality
– Monolithic
– Hard Impossible to scale efficiently
• Goals
– Easy to scale – Expectations for Significant Growth
– Reduce Data Center Footprint
– Update technologies that had gone stale for a long period
4. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Legacy Technology Stack
• Coding Language – C++
• Software Development – manual
build, test; CVS
• System Resource Monitoring – Cacti
• Interface – http
• Infrastructure – Physical Servers &
Load Balancers in Corporate Data
Centers
• Server Acquisition – Purchase
• Server Deployment – Sys Admins
• Log Collection – Splunk
• Escalations – Operations staff
monitoring Splunk output
• Content Classification – SAP SDX
• Communications – emails/meetings
• Project Management – MS
Project/Project Manager
5. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
6. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Impossible Tasks now Possible!
• New Annotators – How do we apply new Metadata to an Archive of 1.5B
Articles?
• Refresh annually – Even more challenging
• Reusability of full Content Pipeline for Consumer Business Purposes
7. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
C r e a t i n g M o d e r n M e t a d a t a S y s t e m s w i t h N e w
R e l i c
8. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Early Days with New Relic
• APM on Legacy Systems – modest value – C++ code base
• Alerts Integration with OpsGenie
• Built Plug-in to extract custom data from legacy code
• APM on Rearchitected Systems – increased value – Java code base
• Insights for Technology Purposes primarily
9. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
APM & OpsGenie
10. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
PRODUCTION RELEASE PERFORMANCE YTD
11. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
12. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
WEEKLY CHANGE IN PERFORMANCE -
CLASSIFIER
13. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Classification Engine Update
• Classification of documents
– 3 Taxonomies – News Subjects, Industries, Regions
– 1000’s of Nodes
– 7 Languages
• Key Component to Discovery and Organization in Products
• Very Different Technologies – Different Results Expected
14. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Insights to the Rescue
• Business Partner ask:
– Scores of spreadsheets
– Static data
– Compare old vs new
• New Relic Insights
– A few dashboards
– Dynamic Data
– Drill through capabilities
15. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
C r e a t i n g M o d e r n M e t a d a t a S y s t e m s w i t h N e w
R e l i c
16. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
17. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
C r e a t i n g M o d e r n M e t a d a t a S y s t e m s w i t h N e w
R e l i c
18. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Code Simple!
// Declare map.
NewRelicInsightsParams = new ConcurrentHashMap<String, Object>();
// Populate it.
long mdp_queue_time =
start.getTimeInMillis()
- auditTrail.get(auditTrail.size() - 1).getAuditEntryCreatedTime()
.toGregorianCalendar().getTimeInMillis();
long time_since_creation =
start.getTimeInMillis()
- auditTrail.get(0).getAuditEntryCreatedTime().toGregorianCalendar()
.getTimeInMillis();
NewRelicInsightsParams.put("queue_time", mdp_queue_time);
NewRelicInsightsParams.put("time_since_creation", time_since_creation);
…
// Record custom event.
NewRelic.getAgent().getInsights().recordCustomEvent("MetadataPipelineComponent", NewRelicInsightsParams);
19. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
What Makes it Magic?
• Simple to code as we have seen – just Name/Value pairs in Map & Send
• Iterations of dashboards/NRQL incredibly fast
• NRQL – “SQL for Managers”
• Refresh rates on large datasets during drill downs very fast even on complex
NRQL
• Ready to answer questions not yet asked
20. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
NRQL
• Looks a bit complex but tools and prediction make it easy
– SELECT filter(uniquecount(mpc_doc_hash),WHERE essex_product_effect!='None') AS '# Doc',
percentage(uniquecount(mpc_doc_hash),WHERE essex_product_effect!='None') AS '% Doc', uniquecount(mpc_doc_hash) AS 'Total Doc',
filter(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Added to Search') AS '# Add Search',
percentage(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Added to Search') AS '% Add Search',
filter(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Added to Nav') AS '# Add Nav',
percentage(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Added to Nav') AS '% Add Nav',
filter(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Added to Nav & Search') AS '# Add N&S',
percentage(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Added to Nav & Search') AS '% Add N&S',
filter(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Lost from Search') AS '# Loss Search',
percentage(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Lost from Search') AS '% Loss Search',
filter(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Lost from Nav') AS '# Loss Nav',
percentage(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Lost from Nav') AS '% Loss Nav',
filter(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Lost from Nav & Search') AS '# Loss N&S',
percentage(uniquecount(mpc_doc_hash),WHERE essex_product_effect='Lost from Nav & Search') AS '% Loss N&S' from
MetadataRegionCodes FACET code where environment='INT' and nr_ver=1 since 1 week ago limit 1000 where language in ('en', 'fr',
'de', 'ru', 'es', 'pt', 'it')
21. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Business Partner Feedback
• “New Relic Insights gives us the big picture – in near real-time!”
• “Instant Statistics! We’ve moved from a few static analyses of 100s of stories to 10s of
thousands of stories every day with drill down capability”
• “We can now prioritize our work and it has become integral to our daily workflow”
• “New Relic Insights gives us vision into code competition that would have been nearly
impossible in the past”
• “Insights gives us high confidence that we are delivering a quality solution to our
customer in a highly complex problem space”
22. C r e a t i n g M o d e r n M e t a d a t a S y s t e m w i t h N e w R e l i c
Thank You!
C r e a t i n g M o d e r n M e t a d a t a S y s t e m s w i t h N e w
R e l i c
2 2
Editor's Notes
Dow Jones Engineer, Architect & Manager for past 20+ years; Previously Lead Ground Station Engineer at Lockheed-Martin
Currently Manage all aspects of Metadata generation for Professional Information Businesses as well as Consumer Products - WSJ, Barrons, …
High Performance, High Availability systems
Want to talk to you today about a pretty incredible effort we’ve taken over the past few years to rebuild our Metadata systems and its corresponding partner systems and will finish the presentation with a specific use case where we leveraged NRI to assist our business partners analyses.
Spend time on the challenges of the pipeline
2M documents/day
24x7x365
Capture/Parse/Normalize/Annotate/Package/Deliver + Archive+Index
1500 feeds – adding every day
29 languages
200M documents received monthly (compression)
1.5-2M docs/day = ~25 docs/sec every second of every day, all year round. Not, of course, at even paces. At times we see 80 DPS
We expect documents to be available in our products minutes after they arrive. CP needs to be fast and handle bursts well
Challenge – be able to support 5x that rate
Annotation – goal to do all of this in 0.5 seconds on average
Keywords, KeyPhrases, Company Extraction, ID & Markup, Person Extraction, ID & Markup, Duplicate Identification, Journalist ID, …
Classification of News Subjects, Industries, Regions
Legacy world serialized
We had a legacy technology stack that like many systems that have their roots in the early part of the 21st century was clearly dated and inflexible
When we decided to seriously invest in modernizing these core systems, we knew nearly everything needed to change.
We liken the effort to re-architecting an airplane from tail to nose, in flight, without the passengers knowing its happening and landing safely
Here are some of the current technology stack that we have deployed over the past few years.
We’ve moved from classic waterfall PM to Agile and we leverage JIRA to manage our Sprint Ceremonies and activities
Development environment – Java primarily + Python
Team Communications – Slack! Love Slack – key to ensure timely and frequent communications in a multi-team development effort – we pushed this hard – e-mail used MUCH less frequently
Build processes – CI/CD – Jenkins, GitHub, Artifactory
Deployment in the cloud – AWS CloudFormation, AMIs, Chef + Homegrown tools to generate CF Templates easily and consistently across stacks (Shout out to Scott Rahner!)
Monitoring and Analytics – NewRelic APM, Plug-ins, Insights – I’ll talk a lot about a specific use case for Insights in a few minutes
Think about the scope of this problem. 1.5 Billion documents. For a full Metadata recoding effort at 0.5 seconds per document that 750M processing seconds
Our goal is to complete full recoding in 4 weeks so we need to process at 300 DPS 24x7 for a month. The investment to do this before the transformation would have been prohibitive – the effort is DOA.
With AWS, ASG, CF across our entire Metadata Platform, it becomes achievable and reasonably priced.
This is what we mean by Metadata Platform.
Each one of these green bubbles is an AWS Auto Scaling group that we can deploy in as many environments as we like.
We can now bring up one of these specifically for processing the archive and let it churn.
Here we can see a recent event where our pipeline got backed up at the MPC portion. There was a significantly larger than normal load dropped in and New Relic Alert triggered and notified through our OpsGenie channel.
In this instance, the system recovered naturally but had the load continued to increase and the backlog become unmanageable, we could scale up portions of the Metadata Platform to drain the queue.
Here we see another dashboard which allows us to monitor how our production Metadata Platform is performing across software releases. We have a quick way to understand and look back.
Here’s another key metric for our Content Pipeline – Number of Gigabytes delivered day by day. Shout to Pankaj Takawale for providing this one.
This is a different kind of performance dashboard with some heatmaps that allow us to see how our new run-time classification engine is performing across different document sizes. This was a real concern for the update project and we can easily see from week to week how the configuration changes are performing. If that dark blue box starts to slide down we can review the config changes with our business partners before it gets out to production and causes issues.
So that leads me to the 2nd half of my presentation where I want to share with you a novel use of NRI to assist in Data Strategies efforts to replace our classification engine.
How different? Good or bad differences?
Hard to really know how the technology transition would look against 2M documents per day in 7 languages
Effort level to get what they asked for actually high and the work by DSG to analyze it even higher
By the time the analysis was done, changes in configuration would have made the analysis stale and another round and another round
It so happened this was around the holiday break last year and I was out of Vacation time – trip to HI – and we had a new hire who was around as well – we decided we’d dive into NRI and see what it could do
After a few iterations we ended up with a really cool Data App that our DSG team loves – and we love as well – our work is done and DSG gets new fresh data all the time
Here’s one of the dashboards we’ve created for our business partners that allows them to see the codes that the new classification engine is coding differently from the old engine. We give them multiple ways to slice and dice the results by language, by publication type, source name, etc.
As they filter the code list and the impacts at the bottom update dynamically and they can focus in on problem areas.
We also have linked this dashboard to another dashboard that then lets them drill through to specific stories and see in great detail the codes and the impacts.
This turbocharges their refinement process.
Here’s an e-mail exchange from one of our business partners in Barcelona. He found a new area of concern and was having difficulty zeroing in on the issue using the native tools of the classifier.
In just a few hours we had the following dashboard for him.
The turnaround time is impressive.
Here we can see the top average scoring codes across the past 7 days and by selecting a code shows the headlines and the score of the top stories.
We’ve created 2 Data Apps that are specifically for our business partners and they LOVE them.
This is a snippet of code that sends a custom event to NRI
We capture a few timings showing how long the item has been in our queue and how long its been since it arrived at Dow Jones’ front door
Can’t get much easier than that
We capture many such attributes and send them in a few different Custom Event tables
A few of us have become “NRQL-heads”
The are a few tricks we’ve learned along the way to improve the dashboards for our users but the real trick is understanding your data and how it is likely to be assessed… what questions will people likely ask?
Watch out for “1” vs 1
No negative performance impacts on a high performance system.
Trial/Error – since the feedback is so quick, it is easy to try something, get it wrong a few times, then get it right.
Needless to say our business partners are thrilled. They are looking for new ways to leverage NRI. One caveat, our solution required sending very large numbers of Custom Events - you can run into license limits fast!
That last one is truly telling. Confidence – Quality
Thank you. Hopefully you can see Dow Jones’ dedication to modernizing our world and how New Relic plays a big part.