SlideShare a Scribd company logo
1 of 12
Download to read offline
Easteros,	
  new	
  forecas.ng	
  
analy.c	
  pla3orm	
  
Jack Wang
Amazon forecasting
History	
  
•  No	
  friendly	
  gateways	
  to	
  access	
  historical	
  
forecas3ng	
  snapshot	
  (input,	
  interim,	
  output,	
  
etc.)	
  
•  No	
  friendly	
  gateways	
  to	
  submit	
  ad-­‐hoc	
  queries	
  
(troubleshoo3ng)	
  and	
  new	
  algorithms	
  
•  SLA	
  ETLs	
  are	
  hard	
  to	
  launch	
  and	
  maintain	
  
•  …
2	
  
Architecture	
  
3	
  
Cloud	
  Based	
  Data	
  
Warehouse
Hadoop	
  (EMR)	
  Clusters
EASTEROS: Router	
  service
EASTEROS: Analy.c	
  Portal	
  /	
  CLI
Why	
  Easteros?	
  
•  Simple	
  gateways	
  for	
  job	
  submission	
  and	
  
monitoring	
  
– Access	
  to	
  each	
  snapshot	
  of	
  pipeline	
  run	
  	
  
•  Separate	
  the	
  big	
  data	
  soGware	
  stack	
  from	
  
users	
  (analysts,	
  scien3sts,	
  retail	
  in-­‐stock	
  
managers)	
  
4	
  
Easteros:Router	
  service	
  	
  
•  Users’	
  perspec3ve	
  
– REST-­‐ful	
  service	
  to	
  run	
  Hive	
  and	
  Hadoop	
  jobs.	
  
– Auto	
  select	
  the	
  proper	
  EMR	
  Clusters	
  based	
  on	
  
cluster	
  load	
  
– Users	
  doesn’t	
  need	
  to	
  setup	
  and	
  maintain	
  clusters	
  
– Sophis3cated	
  users	
  can	
  provide	
  clusters	
  configs	
  
– Check	
  job	
  logs	
  periodically	
  (flush	
  to	
  S3	
  every	
  5	
  
minutes)	
  
5	
  
Easteros:Router	
  service	
  	
  
•  SDE’	
  perspec3ve	
  
– Spin	
  up	
  new	
  clusters	
  automa3cally	
  
– Override	
  site-­‐specific	
  hive/hadoop	
  configura3ons	
  
6	
  
7	
  
EASTE
ROS	
  
DynamoDB	
  
metadata
Configure
Spin	
  up
Submit	
  Query/Algo
Synamo	
  
Archiver
Easteros:	
  service	
  call	
  
8	
  
Easteros
REST-­‐API	
  
	
  
HQL	
  or	
  jars	
  (uploaded	
  to	
  s3)	
  
Command	
  args	
  
Job	
  priority	
  
…	
  
Easteros:	
  service	
  call	
  
9	
  
Easteros
Job	
  id	
  
Job	
  logs	
  
Result	
  file	
  
…	
  
10	
  
11	
  
Job	
  received	
  in	
  NA	
  and	
  CN	
  clusters.	
  
More	
  and	
  more	
  ETL	
  are	
  migrated	
  to	
  Easteros.	
  
Ad-­‐hoc	
  queries	
  are	
  quite	
  stable.
Acknowledgement	
  
Thanks
v Rauser, John;
v Touloumtzis, Michael; and
v Bol, Colleen
12	
  

More Related Content

Viewers also liked (9)

Vincent Coyle CV
Vincent Coyle CVVincent Coyle CV
Vincent Coyle CV
 
Elias Hernández
Elias HernándezElias Hernández
Elias Hernández
 
A2 trialer script
A2 trialer scriptA2 trialer script
A2 trialer script
 
Roberto rincon
Roberto rinconRoberto rincon
Roberto rincon
 
Test linkedin
Test linkedinTest linkedin
Test linkedin
 
Леонардо да Вінчі
Леонардо да ВінчіЛеонардо да Вінчі
Леонардо да Вінчі
 
Ahmad Bzour CV -
Ahmad Bzour CV -Ahmad Bzour CV -
Ahmad Bzour CV -
 
Roberto rincon
Roberto rinconRoberto rincon
Roberto rincon
 
Norton Norline flap discs - Brochure
Norton Norline flap discs - BrochureNorton Norline flap discs - Brochure
Norton Norline flap discs - Brochure
 

Similar to 2014-04-easteros

Cassandra
CassandraCassandra
Cassandra
exsuns
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
DataWorks Summit
 

Similar to 2014-04-easteros (20)

Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Cassandra
CassandraCassandra
Cassandra
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017
 
Scientific
Scientific Scientific
Scientific
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
 
Real World Storage in Treasure Data
Real World Storage in Treasure DataReal World Storage in Treasure Data
Real World Storage in Treasure Data
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Streaming meetup
Streaming meetupStreaming meetup
Streaming meetup
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
CQRS
CQRSCQRS
CQRS
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
 
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesLog Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
 

2014-04-easteros

  • 1. Easteros,  new  forecas.ng   analy.c  pla3orm   Jack Wang Amazon forecasting
  • 2. History   •  No  friendly  gateways  to  access  historical   forecas3ng  snapshot  (input,  interim,  output,   etc.)   •  No  friendly  gateways  to  submit  ad-­‐hoc  queries   (troubleshoo3ng)  and  new  algorithms   •  SLA  ETLs  are  hard  to  launch  and  maintain   •  … 2  
  • 3. Architecture   3   Cloud  Based  Data   Warehouse Hadoop  (EMR)  Clusters EASTEROS: Router  service EASTEROS: Analy.c  Portal  /  CLI
  • 4. Why  Easteros?   •  Simple  gateways  for  job  submission  and   monitoring   – Access  to  each  snapshot  of  pipeline  run     •  Separate  the  big  data  soGware  stack  from   users  (analysts,  scien3sts,  retail  in-­‐stock   managers)   4  
  • 5. Easteros:Router  service     •  Users’  perspec3ve   – REST-­‐ful  service  to  run  Hive  and  Hadoop  jobs.   – Auto  select  the  proper  EMR  Clusters  based  on   cluster  load   – Users  doesn’t  need  to  setup  and  maintain  clusters   – Sophis3cated  users  can  provide  clusters  configs   – Check  job  logs  periodically  (flush  to  S3  every  5   minutes)   5  
  • 6. Easteros:Router  service     •  SDE’  perspec3ve   – Spin  up  new  clusters  automa3cally   – Override  site-­‐specific  hive/hadoop  configura3ons   6  
  • 7. 7   EASTE ROS   DynamoDB   metadata Configure Spin  up Submit  Query/Algo Synamo   Archiver
  • 8. Easteros:  service  call   8   Easteros REST-­‐API     HQL  or  jars  (uploaded  to  s3)   Command  args   Job  priority   …  
  • 9. Easteros:  service  call   9   Easteros Job  id   Job  logs   Result  file   …  
  • 10. 10  
  • 11. 11   Job  received  in  NA  and  CN  clusters.   More  and  more  ETL  are  migrated  to  Easteros.   Ad-­‐hoc  queries  are  quite  stable.