Sale Stock Engineering, represented by Garindra Prahandono, presents "High-Velocity GraphQL & Lambda-based Software Development Model" in BandungJS event on May 14th, 2018.
Improving Mobile Payments With Real time Sparkdatamantra
Talk about real world spark streaming implementation for improving mobile payments experience. Presented at Target data meetup at Bangalore by Madhukara Phatak on 22/08/2015.
Putting the Spark into Functional Fashion Tech AnalysticsGareth Rogers
A talk highlighting the power of functional paradigms for big data pipeline building and Clojure and Apache Spark are a good fit. We cover the lambda architecture and some of the pros and cons that we discovered while using Spark at Metail.
Improving Mobile Payments With Real time Sparkdatamantra
Talk about real world spark streaming implementation for improving mobile payments experience. Presented at Target data meetup at Bangalore by Madhukara Phatak on 22/08/2015.
Putting the Spark into Functional Fashion Tech AnalysticsGareth Rogers
A talk highlighting the power of functional paradigms for big data pipeline building and Clojure and Apache Spark are a good fit. We cover the lambda architecture and some of the pros and cons that we discovered while using Spark at Metail.
Introduction to Structured Data Processing with Spark SQLdatamantra
An introduction to structured data processing using Data source and Dataframe API's of spark.Presented at Bangalore Apache Spark Meetup by Madhukara Phatak on 31/05/2015.
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
During the past 10 years, big-data storage layers mainly focus on analytical use cases. When it comes to analytical cases, users usually offload data onto Hadoop cluster and perform queries on HDFS files. People struggle dealing with modifications on append only storage and maintain fragile ETL pipelines.
On the other hand, although Spark SQL has been proven effective parallel query processing engine, some tricks common in traditional databases are not available due to characteristics of storage underneath. TiSpark sits directly on top of a distributed database (TiDB)’s storage engine, expand Spark SQL’s planning with its own extensions and utilizes unique features of database storage engine to achieve functions not possible for Spark SQL on HDFS. With TiSpark, users are able to perform queries directly on changing / fresh data in real time.
The takeaways from this two are twofold:
— How to integrate Spark SQL with a distributed database engine and the benefit of it
— How to leverage Spark SQL’s experimental methods to extend its capacity.
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
State is an essential part of the modern streaming pipelines: it enables a variety of foundational capabilities like windowing, aggregation, enrichment, etc. But usually, the state is either transient, so we only keep it until the window is closed, or it's fairly small and doesn't grow much. But what if we treat the state differently? The keyed state in Flink can be scaled vertically and horizontally, it's reliable and fault-tolerant... so is scaling a stateful Flink application that different from scaling any data store like Kafka or MySQL?
At Shopify, we've worked on a massive analytical data pipeline that's needed to support complex streaming joins and correctly handle arbitrarily late-arriving data. We came up with an idea to never clear state and support joins this way. We've made a successful proof of concept, ingested all historical transactional Shopify data and ended up storing more than 10 TB of Flink state. In the end, it allowed us to achieve 100% data correctness.
Anatomy of Data Source API : A deep dive into Spark Data source APIdatamantra
In this presentation, we discuss how to build a datasource from the scratch using spark data source API. All the code discussed in this presentation available at https://github.com/phatak-dev/anatomy_of_spark_datasource_api
The presentation covers lambda architecture and implementation with spark. In the presentation we will discuss about components of lambda architecture like batch layer, speed layer and serving layer. We will also discuss its advantages and benefits with spark.
Stream, stream, stream: Different streaming methods with Spark and KafkaItai Yaffe
Going into different streaming methods, we will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty...).
We will also present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services’ costs.
Topics include :
* Kafka and Spark Streaming for stateless and stateful use-cases
* Spark Structured Streaming as a possible alternative
* Combining Spark Streaming with batch ETLs
* “Streaming” over Data Lake using Kafka
Introduction to Structured Data Processing with Spark SQLdatamantra
An introduction to structured data processing using Data source and Dataframe API's of spark.Presented at Bangalore Apache Spark Meetup by Madhukara Phatak on 31/05/2015.
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
During the past 10 years, big-data storage layers mainly focus on analytical use cases. When it comes to analytical cases, users usually offload data onto Hadoop cluster and perform queries on HDFS files. People struggle dealing with modifications on append only storage and maintain fragile ETL pipelines.
On the other hand, although Spark SQL has been proven effective parallel query processing engine, some tricks common in traditional databases are not available due to characteristics of storage underneath. TiSpark sits directly on top of a distributed database (TiDB)’s storage engine, expand Spark SQL’s planning with its own extensions and utilizes unique features of database storage engine to achieve functions not possible for Spark SQL on HDFS. With TiSpark, users are able to perform queries directly on changing / fresh data in real time.
The takeaways from this two are twofold:
— How to integrate Spark SQL with a distributed database engine and the benefit of it
— How to leverage Spark SQL’s experimental methods to extend its capacity.
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
State is an essential part of the modern streaming pipelines: it enables a variety of foundational capabilities like windowing, aggregation, enrichment, etc. But usually, the state is either transient, so we only keep it until the window is closed, or it's fairly small and doesn't grow much. But what if we treat the state differently? The keyed state in Flink can be scaled vertically and horizontally, it's reliable and fault-tolerant... so is scaling a stateful Flink application that different from scaling any data store like Kafka or MySQL?
At Shopify, we've worked on a massive analytical data pipeline that's needed to support complex streaming joins and correctly handle arbitrarily late-arriving data. We came up with an idea to never clear state and support joins this way. We've made a successful proof of concept, ingested all historical transactional Shopify data and ended up storing more than 10 TB of Flink state. In the end, it allowed us to achieve 100% data correctness.
Anatomy of Data Source API : A deep dive into Spark Data source APIdatamantra
In this presentation, we discuss how to build a datasource from the scratch using spark data source API. All the code discussed in this presentation available at https://github.com/phatak-dev/anatomy_of_spark_datasource_api
The presentation covers lambda architecture and implementation with spark. In the presentation we will discuss about components of lambda architecture like batch layer, speed layer and serving layer. We will also discuss its advantages and benefits with spark.
Stream, stream, stream: Different streaming methods with Spark and KafkaItai Yaffe
Going into different streaming methods, we will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty...).
We will also present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services’ costs.
Topics include :
* Kafka and Spark Streaming for stateless and stateful use-cases
* Spark Structured Streaming as a possible alternative
* Combining Spark Streaming with batch ETLs
* “Streaming” over Data Lake using Kafka
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.govChris Shenton
Presentation to the NASA Cloud Community of Interest. Shows the evolution of simple webapps lacking resilience and scalability up to cloud-native apps that tolerate server faults and availability zone outages. Describes how images.nasa.gov leverages S3, EC2, ELB, DynamoDB, CloudSearch, ElasticTranscoder and more to provide a modern, scalable, mobile-friendly site showcasing the best of NASA's images, video and audio.
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
Bài techtalk của anh Khải Trần nói về hệ thống data pipeline của LinkedIn được dùng để thu thập hàng chục tỷ messages mỗi ngày, và cách họ chạy hệ thống real-time processing để thống kê lượng dữ liệu này cho mục đính metrics monitoring.
1 số điểm bài talk sẽ chia sẻ:
- Giới thiệu về hệ thống unified metrics platform của LinkedIn
- Cách LinkedIn setup hệ thống BigData pipeline dùng Kafka, HDFS, Apache Calcite và Apache Samza.
- Khái niệm nearline storage, và cách LinkedIn chuyển từ offline architecture sang nearline architecture.
Speaker: Khai Tran, Staff Software Engineer - LinkedIn.
- Hiện đang là staff software engineer ở LinkedIn, phụ trách hệ thống metrics monitoring system. Trước đây từng làm ở Amazon AWS và Oracle.
- PhD, University of Wisconsin-Madison, nghiên cứu về Database Systems.
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
Metrics play an important role in data-driven companies like LinkedIn, where we leverage them extensively for reporting, experimentation, and in-product applications. We built an offline platform to help people define and produce metrics driven through their transformation code, mostly in Pig or Hive, and metadata-rich configurations. Many of our users would like to look at these metrics in a real-time fashion. To support this, we recently built an extension to the platform that auto-generates Samza real-time flow from existing offline transformation code with just a single command. Combining with the existing offline platform, we delivered Lambda architecture without maintaining multiple code bases.
In this talk, we will describe how we use Apache Calcite to translate our offline logic, served as the single source of truth, into both Samza code and configuration for real-time execution.
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
Those of us who use TensorFlow often focus on building the model that's most predictive, not the one that's most deployable. So how to put that hard work to work? In this talk, we'll walk through a strategy for taking your machine learning models from Jupyter Notebook into production and beyond.
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
The cloud has become one of the most attractive ways for enterprises to purchase software, but it requires building products in a very different way from traditional software
AWS Lambda and Serverless framework: lessons learned while building a serverl...Luciano Mammino
Planet9energy.com is a new electricity company that is building a sophisticated analytics and energy trading platform for the UK market. Since the earliest days of the company we took the unconventional decision to go serverless and finally we are building the product on top of AWS Lambda and the Serverless framework using Node.js. In this talk we will discuss why we took this radical decision, what are the pros and cons of this approach and what are the main issues we faced as a tech team in our design and development experience. We will discuss how normal things like testing and deployment need to be re-thought to work on a serverless fashion but also the benefits of (almost) infinite auto-scalability and the piece of mind of not having to manage hundreds of servers. Finally we will underline how Node.js seems to fit naturally in this scenario and how it makes developing serverless applications extremely convenient.
Thanks to Padraig O'Brien and Luciano Mammino for speaking this month.
Speakers Bio:
Padraig O'Brien
Podge @Podgeypoos79 is a software engineer for over 15 years, most of that was spent developing in .NET and SQL Server, designing and building large scale data intensive applications. Lately he has shifted towards open source technologies and is spending most of his time learning Node.js, Scala and cool data tech like Spark, Cassandra. He is also working on a “super-secret” project called UnicornDB, don’t tell anybody!
In his spare time he helps out with organising some meetups like NodeSchool Dublin, NodeSchool Dun Laoghaire and teaching Kanban via Agile Lean Ireland.
Luciano Mammino
Luciano @loige is a Software Engineer born in 1987, the same year that the Nintendo released “Super Mario Bros” in Europe, which, “by chance” is his favourite game! His primary passion is code and he is extremely fascinated by the web, smart apps and everything that's creative like music, art and design. He started coding at the age of 12 using his father's old i386 provided only with DOS and the qBasic interpreter.He is a senior software developer at Planet9Energy in Dublin and he loves JavaScript (React/Node.js). He is also the co-author of "Node.js design patterns" 2nd edition (Packt, http://amzn.to/1ZF279B).
Hosted by Intercom, sponsored by Nearform and organised by Node.js Dublin (https://www.meetup.com/Dublin-Node-js-Meetup/events/236870576/)
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
At NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences.
To achieve that, we need to ingest billions of events per day into our big data stores, and we need to do it in a scalable yet cost-efficient manner.
In this session, we will discuss how we continuously transform our data infrastructure to support these goals.
Specifically, we will review how we went from CSV files and standalone Java applications all the way to multiple Kafka and Spark clusters, performing a mixture of Streaming and Batch ETLs, and supporting 10x data growth.
We will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty...).
We will present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services' costs.
Topics include :
* Kafka and Spark Streaming for stateless and stateful use-cases
* Spark Structured Streaming as a possible alternative
* Combining Spark Streaming with batch ETLs
* "Streaming" over Data Lake using Kafka
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Storyvanphp
Bill Monkman, Lead Engineer at Hootsuite, presenting on how Hootsuite went from zero to hundreds of millions of requests per day with its PHP codebase, and how dealing with that growth has shaped its future direction. Tips, optimizations, and horror stories from a rapidly-scaling PHP startup.
Video: https://www.youtube.com/watch?v=TZGeBAIMPII
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen
Talk 1. Scaling Apache Spark on Kubernetes at Lyft
As part of this mission Lyft invests heavily in open source infrastructure and tooling. At Lyft Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark at Lyft has evolved to solve both Machine Learning and large scale ETL workloads. By combining the flexibility of Kubernetes with the data processing power of Apache Spark, Lyft is able to drive ETL data processing to a different level. In this talk, We will talk about challenges the Lyft team faced and solutions they developed to support Apache Spark on Kubernetes in production and at scale. Topics Include: - Key traits of Apache Spark on Kubernetes. - Deep dive into Lyft's multi-cluster setup and operationality to handle petabytes of production data. - How Lyft extends and enhances Apache Spark to support capabilities such as Spark pod life cycle metrics and state management, resource prioritization, and queuing and throttling. - Dynamic job scale estimation and runtime dynamic job configuration. - How Lyft powers internal Data Scientists, Business Analysts, and Data Engineers via a multi-cluster setup.
Speaker: Li Gao
Li Gao is the tech lead in the cloud native spark compute initiative at Lyft. Prior to Lyft, Li worked at Salesforce, Fitbit, Marin Software, and a few startups etc. on various technical leadership positions on cloud native and hybrid cloud data platforms at scale. Besides Spark, Li has scaled and productionized other open source projects, such as Presto, Apache HBase, Apache Phoenix, Apache Kafka, Apache Airflow, Apache Hive, and Apache Cassandra.
Angular (v2 and up) - Morning to understand - LinagoraLINAGORA
Slides of the talk about Angular, at the "Matinée Pour Comprendre" organized by Linagora the 22/03/17.
Discover what's new in Angular, why is it more than just a framework (platform) and how to manage your data with RxJs and Redux.
NetflixOSS Meetup S3 E1, covering latest components in Distributed Databases, Telemetry systems, Big Data tools and more. Speakers from Netflix, IBM Watson, Pivotal and Nike Digital
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Taro L. Saito
Scala can be used for developing both frontend (Scala.js) and backend (Scala JVM) applications. A missing piece has been bridging these two worlds using Scala. We built Airframe RPC, a framework that uses Scala traits as a unified RPC interface between servers and clients. With Airframe RPC, you can build HTTP/1 (Finagle) and HTTP/2 (gRPC) services just by defining Scala traits and case classes. It simplifies web application design as you only need to care about Scala interfaces without using existing web standards like REST, ProtocolBuffers, OpenAPI, etc. Scala.js support of Airframe also enables building interactive Web applications that can dynamically render DOM elements while talking with Scala-based RPC servers. With Airframe RPC, the value of Scala developers will be much higher both for frontend and backend areas.
How we have used ansible for real-time industry use cases and Integration with enterprise tools. Infra provisioning and config management using ansible and automating routine tasks.
Skillenza Build with Serverless Challenge - Advanced Serverless ConceptsDhaval Nagar
Skillenza is back with another game-changing virtual hackathon for you. Seize this amazing opportunity to create projects on serverless architecture. For those of you who are not acquainted with it, serverless architectures are system designs that use third-party services to build and run applications.
As developers, this helps you to gain better scalability and flexibility without needing any administration to manage infrastructure. So you can build quicker and at a reduced cost as well.
https://skillenza.com/challenge/build-with-serverless-online-hackathon-aws
Similar to Laskar: High-Velocity GraphQL & Lambda-based Software Development Model (20)
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
4. Sale Stock - Overview
● E-commerce founded in late 2014
○ Internal Engineering founded in early 2015
○ Launched our in-house website in mid-2015, app in late-2015
● Concentrated on women's fashion
○ Around Rp 100rb range as opposed to > Rp 200 range
○ Recently expanded to men’s fashion and women’s lifestyle
5. Sale Stock - Overview
● US$ 27 million Series B mid-2017
● Doubled revenue since then
● Margins comparable to European vertically-integrated unicorns
○ ASOS
○ Zalando
○ Boohoo.com
○ etc.
○ Margins still improving
6.
7. Sale Stock - Multi-Layered Defensibles
● Vertically-integrated
● Business layers (all done in-house):
○ Merchandising
○ Manufacturing
○ Logistics
○ Customer Service
○ Finance
○ Etc.
● Inject software and automation in every layer of the vertical
integration
8. Sale Stock - Multi-Layered Defensibles
● This translates to a *LOT* of software
● ~15 user-facing applications internally, ~200 services
● BUT, decided to be simple organizationally
○ Want to financially make sense
○ 30+ software engineers, ~60 total in Engineering
○ A small team given the scale and diversity of portfolio
● How to do these without killing the engineers? Better tooling.
9.
10. GraphQL is a query language for your API, and a server-side runtime for
executing queries by using a type system you define for your data.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23. GraphQL
● No overfetching, no-underfetching
● Super easy versioning
○ Single endpoint to maintain
○ Maintenance and deprecation can be done at the per-field level
○ Easy to support years-old clients this way
● But…
● Performance is a problem if not taken care of (a bit of a longer story)
● You’re still coding -- we can make this simpler!
27. Laskar - Schema
● Primary way developer inject behavior into the platform
● Developers write a declarative, YAML-based manifest file called
SDML (Sale Stock Domain Markup Language) -- no explicit
imperative-style coding
● Entity is a core concept declared in the SDML
29. Laskar - Schema
● Entities are stored in the SQL-compliant persistence layer as a table
● Entities have “stored” properties
○ will be mapped to SQL table fields
● GraphQL CRUD queries and mutations will automatically be
“generated” for these entities
○ get<Entity>, create<Entity>, edit<Entity>, delete<Entity>, get<Entity>List, etc.
● Can also replace some straightforward SQL filtering logic
● Replaces tons of boilerplate -- can be thousands of lines of code!
○ Controller
○ DAL
○ etc.
31. ● Can also express SQL-based filtered
reads!
○ Implements WHERE and SORT BY clauses
○ GraphQL query is instantly available
○ Index is automatically created for the filtered
and sort fields
○ Good performance is automatic -- one less
thing for developers to take care of
○ Value caching (provided by the cache layer)
is also possible if you need the extra
performance
32. Laskar - Schema
● End-to-end Development flow is all integrated within a tool
○ Local development
○ Schema migration
○ Schema deployment
36. Laskar - Gateway
● A horizontally-scaled set of a GraphQL-compliant server
● All Laskar-based requests (e.g. from front-end code) will go through
a single endpoint served by these servers.
● Dynamically performs GraphQL “behaviors” (queries, mutations,
types, etc.) based on currently-active Schema.
● Schemas are synced in realtime to the Gateway
38. Laskar - Gateway
● The simplicity of the Gateway gives us:
○ No binary / image deployment! (deployment is just a schema update!)
○ No more extra running processes for each service
○ Free distributed tracing and observability in general! (since all sub-requests are
made by the gateway)
39. But what if my use cases extend beyond CRUD?
What if I want to have “funkier” logic?
42. ● Once you want to put custom logic, Lambda’s the answer
● A few integration points:
○ Entity’s computed properties
○ Root queries
● Entities can have “computed” properties (as opposed to “stored”), if
you want the properties to be computed on the fly with a Lambda
○ Product { score }
● Top-level queries that are outside of CRUD boundaries:
○ getTrendingProducts (needs access to data-intensive service)
Laskar - Lambda
43. Laskar - Lambda
● What is it?
○ A predefined, local function (sum, takeFirst, etc.)
○ A remote call
■ To a microservice
■ To a Cloud Function
52. Laskar - Query Persistence
● GraphQL queries have to be pre-registered (termed “Persisted
Query”)
● On build time, GraphQL queries are extracted out of source code,
hashed, and sent to the server for persistence. IDs are then assigned
to each query.
● On production, queries are then not sent on its raw text form, but
only its ID
53. Laskar - Query Persistence
● On production, queries are then not sent on its raw text form, but
only its ID
● Increases security and performance
54. Laskar - Query Planning
● Much like a SQL database, prior to execution, upon request, query
text is transformed to a Plan.
● Let’s visualize.
55.
56.
57.
58.
59. Laskar - Query Planning
● Query planning is done only once per query-ID -- after which the
plan output is cached.
● Subsequent query for that ID uses plan from the initial planning.
● Plan caching improves performance -- CPU usage is historically
extremely low in Laskar-Gateway
60. Laskar - Query Execution
● On the same stages, multiple requests to stored properties are
de-duped and batched into a single SQL query to the persistence
layer.
● Requests to similar lambdas are also de-duped (with some caveats).
● Automatically uses cached data for SQL-layer load-shedding.
62. Laskar - Interfaces
● Admin interfaces for the data can also be created automatically
● No need for developers to write React / front-end code
● Everything is derived from the schema inspection and a simple
config file.
64. Laskar - Where is it used currently?
● Sale Stock homepage! (specifically the banner management system)
● Product category management
● Image management system
● Promotion & landing page management system
● Many more in the pipeline!
65. Laskar - Benefits
● Much fewer lines of code -- virtually no boilerplate
● Can be used in 60-70% of use cases
○ If it’s simple CRUD, then it’s used
● Instant performance -- easy caching, indexing
○ Index creation is automatic
○ In the declaration, possible queries are defined upfront -- possible to force-create
indexes.
○ Reduces the number of avenues for possible developer mistakes
○ Caching is as easy as a one-liner -- invalidations are automatic
66. Laskar - Benefits
● Automatic schema migration
● In one case, 20k lines of code were replaced by < 300 lines of YAML
code -- identical feature set
● Instant deployment
● Huge hardware requirement savings
○ No set of new hardware needed for each new feature
○ Especially for low-traffic feature uses
● Non-programmers can theoretically implement features
69. Careers
● We’re hiring!
● Practically all zaman-now startup positions are available, but
highlighting two:
○ Front-end Platform Engineer
○ Data Platform Engineer
● https://careers.salestock.io
70. Careers - Front-end Platform Engineer
● Join a team with a knack for the bleeding edge
● We used:
○ React in early 2015 (with in-house server-side rendering)
○ RN in early 2016
○ GraphQL in early 2016
○ Single-codebase web/app React+RN in early 2016
○ Uses Apollo in late 2016
● Knack for implementing custom tooling where it makes sense
○ Laskar and lots of other internal tooling
● Write apps for millions of users
71. Careers - Data Platform Engineer
● Join a team that’s building a next-generation data platform that
empowers data scientists, analysts, and software engineers to
smoothly collaborate and fully-own their data science and pipeline.
● We craft and create the end-to-end platform with:
○ Jupyter Notebook
○ Apache Spark
○ Google Dataflow, Dataproc
○ Google BigQuery
○ Tensorflow
○ Kubeflow
○ Kubernetes
○ Lots of custom tooling