QCon San Francisco Presentation, Scaling Ancestry DNA with HBase and Hadoop
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

QCon San Francisco Presentation, Scaling Ancestry DNA with HBase and Hadoop

on

  • 455 views

This was presented at QCon San Francisco on 11/11/2013 as part of the "Architectures you always wondered about" track. It was a tag team effort by Jeremy Pollack and myself. We presented the ...

This was presented at QCon San Francisco on 11/11/2013 as part of the "Architectures you always wondered about" track. It was a tag team effort by Jeremy Pollack and myself. We presented the manager/developer points-of-view. It was a well received presentation.

Statistics

Views

Total Views
455
Views on SlideShare
453
Embed Views
2

Actions

Likes
0
Downloads
2
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • [BILL]DNA Matching: We will walk through and example of how matching works, discuss how GERMLINE implemented the matching, and contrast that with the Hadoop/HBase implementation we created.
  • At Ancestry.com our mission is to help people discover, preserve and share their family history. 12 billion records, 10 PB of supporting data, 30,000+ record collections of various sizes, and DNA is another way to make discoveries.
  • Everything from birth certificates, obituaries, immigration records, census records, voter registration, old phone books, everything.
  • [TRANSITION TO JEREMY]Typically, the way it works is this :You search through our records to find one of your relatives. Once you've found enough records that you're satisfied you've found your relative, you attach them to your family tree. After that, Ancestry goes to work for you. Our search engine takes a look at your whole tree to find relatives that you may not know about yet, and presents these to you as hints. (shaky leaf) You can then examine these hints and see if they are, in fact, related to you. It's pretty cool! And the beauty of it is that, say you've found a relative who's researched their family tree pretty extensively? Well, you get to piggyback on all that research by simply adding their family tree to yours. A fine example of crowdsourcing in action. So, that's the standard way to do genealogy, and it works quite well -- on average, Ancestry.com handles more than 40 million searches per day, and our members have made more than 6 billion connections between their trees and other subscribers' trees. However, sometimes this process breaks down. For example, what happens if your family came to the country as slaves? We wouldn't have any immigration records and wouldn't have any census or voting records before a certain date; you could trace your family back to a certain point, but after that, you hit a wall. Another example : what happens if you just don't know your extended family very well? It's hard to search for people when you don't know their names or anything about them.
  • Enter DNA! Spit in a tube, pay $99, learn your past. Basically, you send us a DNA sample, we analyze it, and help you learn more about your family. First, we decode your family origins : whether your Swedish, Chinese, or Indian, or some combination of the above, we can tell you what regions of the globe your ancestors came from. Second, we can help you find your long-lost relatives, and this is largely what we'll be talking about today. This knowledge can be significant; the average customer finds 45 relatives in our database who are 4th cousin or closer. This is how we improve on the standard methods of genealogy : even if you don't know your extended family, or have no way of knowing them, as long as their in our database, we can help you make a connection with them.
  • [BILL? JEREMY?]This is one quadratic curve we like to see. As the data base pool grows, the number of genetic cousins matches found grows rapidly.On average, we deliver 45 4th cousin matches to each user who takes a DNA test. Why is this important? A 4th cousin match means there is likely a common ancestor about 150 to 300 years ago. Our confidence level is about 90% for a 4th cousin match. That means 40 of the 45 suggested matches are valid genetic cousins. When we started and the DNA pool size was 10,000 there was only 1 4th cousin match per user. So you can see the growth.
  • [TRANSITION TO BILL]
  • [BILL AND JEREMY]Every scientist thinks they can code – because they have been doing it for a long time on their own or in an academic environment. But they don’t know what it means to build, deploy, support “production” code. Software engineers understand production code. They just think they understand the math and statistics – after all they are computer scientists. They can understand the science behind DNA, after all, they took Biology in high school. Nowhere near the education of a Bioinformatics or Population Geneticist PhD. The Science Team are the domain experts and the engineers are required to build a production system to meet the domain expert’s needs.Really started light 3 developers and 2 scientists. In fact, for the first 3 months we “borrowed” engineers from other projects to get this started.
  • [TRANSITION TO BILL]We repurposed this box from our Redis caching layer. This is when you are told the machine you ordered has been given to another team – don’t worry, we’ll replace it soon.The original GERMLINE ran on 10 threads, and by the time we were up to an 80K pool, we'd gone down to 4 threads and were still swapping to disk – and we had maxed out the memory for this machine.IF WE STAYED IN THIS CONFIGURATION (WHICH MATCHED MANY ACEDEMIC ENVIRONMENTS) THE ONLY OPTION WAS TO INCREASE THE HARDWARE. MORE CPUS, MORE MEMORY. SCALING VERTICALLY JUST PLAIN SUCKS!
  • Critically important. In software development you must measure your performance at every step. Does not matter what you are doing, if you are not measuring your performance, you can’t improve. The last point is critical. We could determine the formula for performance of key phases (correlate this) and used that formula to predict future performance at particular DNA pool sizes. We could see the problems coming and knew when we were going to have performance issues.Story #1: Our first step that was going out of control (going quadratic), was the first implementation of the relationship calculation – happens just after matching. This step was basically two nested for loops that walked over the entire DNA pool for each input sample. Simple code, it worked with small numbers, fell over fast. Time was approaching 5 hours to run. Two of my developers rewrote this in PERL and got it down to 2 minutes 30 seconds. They were ecstatic. One of our DNA Scientists (PhD in Bioinformatics, MS in Computer Science – he knows how to code) wrote an AWK command (nasty regular expressions) that ran in less than 10 seconds. My devs were humbled. For the next week, whenever they ran into Keith, they formally bowed to his skills. (All in good nature, all fun.)
  • Static by batch size (Phasing). Some steps took a long time but were very consistent. A worry but not critical to fix up front.Linear by DNA Pool size (Pipeline Initialization). Looked at ways to streamline and improve performance of these steps.Quadratic – those are the time bombs (Germline, Relationship processing, Germline results processing)The only way we knew this was coming was because we measured each step in the pipeline.
  • [TRANSITION TO JEREMY]
  • Very smart people at Columbia University came up with GERMLINE.
  • Remember, for an academic, running a 1000 sample set through GERMLINE was “large”. I’ve talked to people who kept re-running the same 50 fish DNA samples through GERMLINE to clean up the variations between sample extractions (think of it as eliminating all the zeros).In a lot of ways, we were using GERMLINE in a way that it was not built for.
  • Mention how we kept upgrading and tightening things up
  • Our projections showed how bad the execution time would get. As we approached 120K for the DNA pool size, each additional 1000 sample set would require 700 hours to complete – over 4 weeks.
  • [JEREMY AND BILL]Germline with a “J” (lead engineer’s first name is Jeremy)This was a “modified clean room” implementation of the algorithm. Read the reference paper, looked at some parts of the reference implementation code, based our work on that.
  • [JEREMY]
  • Using BattlestarGallactica for the matching example.
  • For each person-to-person comparison, we add up the total length of their shared DNA and run that through a statistical model to see how closely they're related. This is the “Relationship Calculation” step that works on the GERMLINE output.
  • Remind people that GERMLINE was stateless
  • Anytime you see an N-by-N comparison in a computer problem you are working on it should send up huge red flags.
  • HBase holds the data. (Mix between a spread sheet and a hash table.) Adding columns is easy. Having a very sparse matrix is fine. Key is the chromosome, the word value, and position (which word). Each new sample adds a column to the table. A value of 1 in the cell indicates this user has this value at this location. A row holds all the samples with that same value in our DNA poolsize.This is really a pretty simple implementation. Remember: SIMPLE SCALES.
  • These are the updated tables after adding Baltar’s information. Only looking at 3 samples, chromosome #2, positions 0, 1, and 2. Very simple example of how the matching process works but it is exactly what we do.
  • There were a whole bunch of characters on BattlestarGallactica!
  • Story #3: We would run samples through the old GERMLINE and the new HadoopJermline. For the most part, they always matched. We finally found a few runs where there were discrepancies. We had to pull in the Science Team to check – we had actually found a bug in the original GERMLINE implementation for an edge case. The clean room implementation of the Hadoop code was “more correct” than the original C++ GERMLINE reference code. Very gratifying to see – but the truth is it had us concerned and confused for about 3 days.Made the natural assumption that the base implementation GERMLINE (with a ‘G’) was 100% correct. That assumption was wrong.
  • [TRANSITION TO BILL]This slide is a huge relief. We’ve been released and steady for a while. One note, the curve for H2 is not totally flat. It is going up ever so slightly. No worries. We can always add more nodes to the cluster and reduce the time.
  • This is a graph of every step in the pipeline. You can see when we released Jermline and it should be obvious that the GERMLINE/Jermline matching step is the orange. You can see other steps where incremental change has improved performance. The light green is an initialization step that was greatly reduced when Jermline was released. Other drops represent adding more memory to the beefy box, Changing the GERMLINE code to be new-by-all instead of all-by-all. Moving our ethnicity step to Hadoop.This is an “Agile” development story. Working through the problems as they came up, making incremental change, and getting a big payoff over time.
  • The “Beefy Box” would be a good candidate for a large database server or a single node on a heavily used distributed cache (Memcache-D or Redis)
  • [TRANSITION TO JEREMY]
  • Specific lessons learned :* Hotspotting due to .92's bad load balancer; had to upgrade to .94* Cache sensitivity of application; had to run chromosomes separately* Got timeouts from HBase because it was taking a long time to pull data. Turns out this was fine; our strategy was to pull a lot of data at once and let the compute nodes churn through it. In this case, we just upped the timeout interval.
  • Specific lessons learned :* Hotspotting due to .92's bad load balancer; had to upgrade to .94* Cache sensitivity of application; had to run chromosomes separately* Got timeouts from HBase because it was taking a long time to pull data. Turns out this was fine; our strategy was to pull a lot of data at once and let the compute nodes churn through it. In this case, we just upped the timeout interval.
  • Specific lessons learned :* Hotspotting due to .92's bad load balancer; had to upgrade to .94* Cache sensitivity of application; had to run chromosomes separately* Got timeouts from HBase because it was taking a long time to pull data. Turns out this was fine; our strategy was to pull a lot of data at once and let the compute nodes churn through it. In this case, we just upped the timeout interval.
  • Specific lessons learned :* Hotspotting due to .92's bad load balancer; had to upgrade to .94* Cache sensitivity of application; had to run chromosomes separately* Got timeouts from HBase because it was taking a long time to pull data. Turns out this was fine; our strategy was to pull a lot of data at once and let the compute nodes churn through it. In this case, we just upped the timeout interval.
  • [TRANSITION TO BILL]
  • [BILL AND JEREMY]
  • [BILL]
  • [TRANSITION TO JEREMY]"We mentioned that DNA can help people find their distant relatives, even if their ancestors were brought to America as slaves. Here, we examined the DNA of African Americans of Senegalese ancestry, and by correlating that data with their family trees, we were able to piece together their family history. Looking at the maps, you can see a concentration of Senegalese ancestors in South Carolina. Prior to our analysis, there was some historical evidence of this, but using DNA and family trees, we could strongly support that thesis."
  • [BILL AND JEREMY]

QCon San Francisco Presentation, Scaling Ancestry DNA with HBase and Hadoop Presentation Transcript

  • 1. Scaling AncestryDNA Using Hadoop and HBase November 11, 2013 Jeremy Pollack (Engineer) and Bill Yetman (Manager) 1
  • 2. What Does This Talk Cover? What does Ancestry do? How does the science work? How did our journey with Hadoop start? DNA matching with Hadoop and Hbase Lessons Learned What’s next? 2
  • 3. Ancestry.com Mission 3
  • 4. Discoveries are the Key We are the world's largest online family history resource • Over 30,000 historical content collections • 12 billion records and images • Records dating back to 16th century • 10 petabytes 4
  • 5. Discoveries in Detail The “eureka” moment drives our business 5
  • 6. Discoveries with DNA Spit in a tube, pay $99, learn your past Autosomal DNA tests Over 200,000+ DNA samples 700,000 SNPs for each sample 10,000,000+ cousin matches 150,000 Genotyped Samples 100,000 50,000 - DNA molecule 1 differs from DNA molecule 2 at a single base-pair location (a C/T polymorphism) (http://en.wikipedia.org/wiki/Singlenucleiotide_polymorphism) 6
  • 7. Network Effect – Cousin Matches 3,500,000 Cousin Matches 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 2,000 10,053 21,205 40,201 60,240 Database Size 7 80,405 115,756
  • 8. Where Did We Start? The process before Hadoop 8
  • 9. What’s the Story? Cast of Characters (Scientists and Software Engineers) Scientists Think they can code: • Linux • MySQL • PERL and/or Python Software Engineers Think they are Scientists: • Biology in HS and College • Math/Statistics • Read science papers Pressures of a new business – Release a product, learn, and then scale Sr. Manager and 3 developers and 2 member Science Team 9
  • 10. What Did “Get Something Running” Look Like? Ethnicity Step and Matching (Germline) runs here “Beefy Box” Specifics: 1) Ran multiple threads for the two steps 2) Both steps were run in parallel 3) As the DNA Pool grew both steps required more memory Single Beefy Box – Only option is to scale Vertically 10
  • 11. Measure Everything Principle • Start time, end time, duration in seconds, and sample count for every step in the pipeline. Also the full end-toend processing time • Put the data in pivot tables and graphed each step • Normalize the data (sample size was changing) • Use the data collected to predict future performance 11
  • 12. Challenges and Pain Points Performance degrades when DNA pool grows • Static (by batch size) • Linear (by DNA pool size) • Quadratic (Matching related steps) – Time bomb (Courtesy from Keith’s Plotting) 12
  • 13. New Matching Algorithm Hadoop and HBase 13
  • 14. What is GERMLINE? • GERMLINE is an algorithm that finds hidden relationships within a pool of DNA • GERMLINE also refers to the reference implementation of that algorithm written in C++ • You can find it here : http://www1.cs.columbia.edu/~gusev/germline/ 14
  • 15. So What’s the Problem? • GERMLINE (the implementation) was not meant to be used in an industrial setting    Stateless Single threaded Prone to swapping (heavy memory usage) • GERMLINE performs poorly on large data sets • Our metrics predicted exactly where the process would slow to a crawl • Put simply: GERMLINE couldn't scale 15
  • 16. Hours GERMLINE Run Times (in hours) 25 20 15 10 5 0 60,000 57,500 55,000 52,500 50,000 47,500 45,000 42,500 40,000 37,500 35,000 32,500 30,000 27,500 25,000 22,500 20,000 17,500 15,000 12,500 10,000 7,500 16 5,000 2,500 Samples
  • 17. Projected GERMLINE Run Times (in hours) 700 600 500 Hours 400 300 200 GERMLINE run times 100 Projected GERMLINE run times 0 122,… 112,… 102,… 92,5… 82,5… Samples 72,5… 62,5… 52,5… 42,5… 32,5… 22,5… 12,5… 2,500 17
  • 18. The Mission : Create a Scalable Matching Engine ... and thus was born (aka "Jermline with a J") 18
  • 19. What is Hadoop? • Hadoop is an open-source platform for processing large amounts of data in a scalable, fault-tolerant, affordable fashion, using commodity hardware • Hadoop specifies a distributed file system called HDFS • Hadoop supports a processing methodology known as MapReduce • Many tools are built on top of Hadoop, such as HBase, Hive, and Flume 19
  • 20. What is MapReduce? 20
  • 21. What is HBase? • HBase is an open-source NoSQL data store that runs on top of HDFS • HBase is columnar; you can think of it as a weird amalgam of a hashtable and a spreadsheet • HBase supports unlimited rows and columns • HBase stores data sparsely; there is no penalty for empty cells • HBase is gaining in popularity: Salesforce, Facebook, and Twitter have all invested heavily in the technology, as well as many others 21
  • 22. Battlestar Galactica Characters, in an HBase Table KEY is_cylon hair_color gender is_final_five no Six blonde female Adama 22 true false brown male rank admiral
  • 23. Adding a Row to an HBase Table KEY is_cylon hair_color gender is_final_five no Six blonde female Adama false brown male Baltar 23 true false brown male rank admiral
  • 24. Adding a Column to an HBase Table KEY is_cylon hair_color gender is_final_five Six true blonde female no Adama false brown male Baltar false brown male 24 rank friends admiral Kara Thrace, Saul Tigh
  • 25. DNA Matching : How it Works The Input Starbuck : ACTGACCTAGTTGAC Adama : TTAAGCCTAGTTGAC Kara Thrace, aka Starbuck • Ace viper pilot • Has a special destiny • Not to be trifled with 25 Admiral Adama • Admiral of the Colonial Fleet • Routinely saves humanity from destruction
  • 26. DNA Matching : How it Works Separate into words 0 1 2 Starbuck : ACTGA CCTAG TTGAC Adama : TTAAG CCTAG TTGAC 26
  • 27. DNA Matching : How it Works Build the hash table 0 1 2 Starbuck : ACTGA CCTAG TTGAC Adama : TTAAG CCTAG TTGAC ACTGA_0 : Starbuck TTAAG_0 : Adama CCTAG_1 : Starbuck, Adama TTGAC_2 : Starbuck, Adama 27
  • 28. DNA Matching : How it Works Iterate through genome and find matches 0 1 2 Starbuck : ACTGA CCTAG TTGAC Adama : TTAAG CCTAG TTGAC ACTGA_0 : Starbuck TTAAG_0 : Adama CCTAG_1 : Starbuck, Adama TTGAC_2 : Starbuck, Adama Starbuck and Adama match from position 1 to position 2 28
  • 29. Does that mean they're related? ...maybe 29
  • 30. IBD to Relationship Estimation 0.02 0.03 0.04 m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 0.00 0.01 • This is basically a classification problem probability • We use the total length of all shared segments to estimate the relationship between to genetic relatives 0.05 ERSA 5 10 20 50 100 200 total_IBD(cM) 30 500 1000 5000
  • 31. But Wait...What About Baltar? Baltar : TTAAGCCTAGGGGCG Gaius Baltar • Handsome • Genius • Kinda evil 31
  • 32. Adding a new sample, the GERMLINE way 32
  • 33. The GERMLINE Way Step one: Rebuild the entire hash table from scratch, including the new sample 0 1 2 Starbuck : ACTGA CCTAG TTGAC Adama : TTAAG CCTAG TTGAC Baltar : TTAAG CCTAG GGGCG ACTGA_0 : Starbuck TTAAG_0 : Adama, Baltar CCTAG_1 : Starbuck, Adama, Baltar TTGAC_2 : Starbuck, Adama GGGCG_2 : Baltar 33
  • 34. The GERMLINE Way Step two: Find everybody's matches all over again, including the new sample. (n x n comparisons) 0 1 2 Starbuck : ACTGA CCTAG TTGAC Adama : TTAAG CCTAG TTGAC Baltar : TTAAG CCTAG GGGCG ACTGA_0 : Starbuck TTAAG_0 : Adama, Baltar CCTAG_1 : Starbuck, Adama, Baltar TTGAC_2 : Starbuck, Adama GGGCG_2 : Baltar Starbuck and Adama match from position 1 to position 2 Adama and Baltar match from position 0 to position 1 Starbuck and Baltar match at position 1 34
  • 35. The GERMLINE Way Step three: Now, throw away the evidence! 0 1 2 Starbuck : ACTGA CCTAG TTGAC Adama : TTAAG CCTAG TTGAC Baltar : TTAAG CCTAG GGGCG ACTGA_0 : Starbuck TTAAG_0 : Adama, Baltar CCTAG_1 : Starbuck, Adama, Baltar TTGAC_2 : Starbuck, Adama GGGCG_2 : Baltar Starbuck and Adama match from position 1 to position 2 Adama and Baltar match from position 0 to position 1 Starbuck and Baltar match at position 1 You have done this before, and you will have to do it ALL OVER AGAIN. 35
  • 36. The Way Step one: Update the hash table Starbuck 2_ACTGA_0 Adama 1 2_TTAAG_0 1 2_CCTAG_1 1 1 2_TTGAC_2 1 Already stored in HBase 1 Baltar : TTAAG CCTAG GGGCG New sample to add Key : [CHROMOSOME]_[WORD]_[POSITION] Qualifier : [USER ID] Cell value : A byte set to 1, denoting that the user has that word at that position on that chromosome 36
  • 37. The Way Step two: Find matches, update the results table 2_Starbuck 2_Starbuck 2_Adama 2_Adama { (1, 2), ...} Already stored in HBase { (1, 2), ...} Baltar and Adama match from position 0 to position 1 Baltar and Starbuck match at position 1 New matches to add Key : [CHROMOSOME]_[USER ID] Qualifier : [CHROMOSOME]_[USER ID] Cell value : A list of ranges where the two users match on a chromosome 37
  • 38. The Way Hash Table Starbuck 2_ACTGA_0 Adama Baltar 1 1 1 1 2_TTAAG_0 2_CCTAG_1 1 1 2_TTGAC_2 1 1 2_GGGCG_2 1 Results Table 2_Starbuck 2_Adama 38 { (1), ...} { (1), ...} { (1, 2), ...} 2_Baltar 2_Baltar { (1, 2), ...} 2_Starbuck 2_Adama { (0,1), ...} { (0,1), ...}
  • 39. But wait ... what about Zarek, Roslin, Hera, and Helo? 39
  • 40. Run them in parallel with Hadoop! Photo by Benh Lieu Song 40
  • 41. Parallelism with Hadoop • Batches are usually about a thousand people • Each mapper takes a single chromosome for a single person • MapReduce Jobs : Job #1 : Match Words o Updates the hash table Job #2 : Match Segments o 41 Identifies areas where the samples match
  • 42. How does perform? A 1700% performance improvement over GERMLINE! (Along with more accurate results) 42
  • 43. Hours Run times for Matching (in hours) 25 20 15 10 5 0 117,500 112,500 107,500 102,500 97,500 92,500 87,500 82,500 77,500 72,500 67,500 62,500 57,500 52,500 47,500 42,500 37,500 32,500 27,500 22,500 17,500 12,500 43 7,500 2,500 Samples
  • 44. Run times for Matching (in hours) 180 160 140 120 Hours 100 GERMLINE run times 80 Jermline run times 60 Projected GERMLINE run times 40 20 0 44 Samples
  • 45. Incremental Changes Over Time • Support the business, move incrementally and adjust • After H2, pipeline speed stays flat (Courtesy from Bill’s plotting) 45
  • 46. Dramatically Increased our Capacity Bottom line: Without Hadoop and HBase, this would have been expensive and difficult. 46
  • 47. And now for everybody's favorite part .... Lessons Learned 47
  • 48. Lessons Learned What went right? 48
  • 49. Lessons Learned : What went right? • This project would not have been possible without TDD • Two sets of test data : generated and public domain • 89% coverage • Corrected bugs found in the reference implementation • Has never failed in production 49
  • 50. Lessons Learned What would we do differently? 50
  • 51. Lessons Learned : What would we do differently? • Front-load some performance tests   HBase and Hadoop can have odd performance profiles HBase in particular has some strange behavior if you're not familiar with its inner workings • Allow a lot of time for live tests, dry runs, and deployment  51 These technologies are relatively new, and it isn't always possible to find experienced admins. Be prepared to "get your hands dirty"
  • 52. What’s next for the Science Team? 52
  • 53. Our new lab in Albuquerque, NM 53
  • 54. Okay, for real this time. What’s next for the Science Team? 54
  • 55. More Accurate Results Potential matches Relevant matches 55
  • 56. Mapping Potential Birth Locations for Ancestors Birth locations from 1750-1900 of individuals with large amounts of genetic ancestry from Senegal 1750-1850 1800-1900 Over-represented birth location in individuals with large amounts of Senegalese ancestry Birth location common amongst individuals with W. African ancestry 56
  • 57. How will the engineering team enable these advances? 57
  • 58. Engineering Improvements • Implement algorithmic improvements to make our results more accurate • Recalculate data as needed to support new scientific discoveries • Utilize cloud computing for burst capacity • Create asynchronous processes to continuously refine our data • Whatever science throws at us, we'll be there to turn their discoveries into robust, scalable solutions 58
  • 59. End of the Journey (for now) Questions? Tech Roots Blog: http://blogs.ancestry.com/techroots 59
  • 60. Appendix 60
  • 61. Appendix A. Who are the presenters? Bill Yetman 61 Jeremy Pollack