Behaviour-driven development (BDD) started as an improved variation on test-driven development, but has evolved to become a formidable tool that helps teams communicate more effectively about requirements, using conversation and concrete examples to discover what features really matter to the business. BDD helps teams focus not only on building features that work, but on ensuring that the features they deliver are the ones the client actually needs.
Learn what BDD is, and what it is not
Understand that the core of BDD is around conversation and requirements discovery, not around tools.
Understand the difference and similarities between BDD at the requirements level, and BDD at the coding level.
Learn what BDD tools exist for different platforms, and when to use them
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유 (2부)Hyojun Jeon
NDC18에서 발표하였습니다. 현재 보고 계신 슬라이드는 2부 입니다.(총 2부)
- 1부 링크: https://goo.gl/3v4DAa
- 2부 링크: https://goo.gl/wpoZpY
(SlideShare에 슬라이드 300장 제한으로 2부로 나누어 올렸습니다. 불편하시더라도 양해 부탁드립니다.)
Behaviour-driven development (BDD) started as an improved variation on test-driven development, but has evolved to become a formidable tool that helps teams communicate more effectively about requirements, using conversation and concrete examples to discover what features really matter to the business. BDD helps teams focus not only on building features that work, but on ensuring that the features they deliver are the ones the client actually needs.
Learn what BDD is, and what it is not
Understand that the core of BDD is around conversation and requirements discovery, not around tools.
Understand the difference and similarities between BDD at the requirements level, and BDD at the coding level.
Learn what BDD tools exist for different platforms, and when to use them
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유 (2부)Hyojun Jeon
NDC18에서 발표하였습니다. 현재 보고 계신 슬라이드는 2부 입니다.(총 2부)
- 1부 링크: https://goo.gl/3v4DAa
- 2부 링크: https://goo.gl/wpoZpY
(SlideShare에 슬라이드 300장 제한으로 2부로 나누어 올렸습니다. 불편하시더라도 양해 부탁드립니다.)
우리는 지금 무엇을 하고있는지를 고민하나요? 아니면 무엇이 되어가고 있는지를 고민하나요? 네 맞습니다. 우리는 매년 무엇을 할지 고민합니다. 그런데 중요한것은 방향 즉 어디를 가고 있는지 입니다.
그래서 넷플릭스의 추천 시스템이 어디를 향해 가고 있는지를 살펴보고 추천시스템의 향해 가야할 Goal에 대하여 같이 이야기를 해보고자 합니다
NDC21_게임테스트자동화5년의기록_NCSOFT_김종원.pdfJongwon Kim
NDC 2021에서 발표하였던 '게임 테스트 자동화 5년의 기록'이라는 주제로 NCSOFT에서 5년 동안 진행했던 게임 테스트 자동화에 대한 내용을 정리한 회고 내용입니다.
영상은 https://youtu.be/ckqUzRyIPoA 에 올라와 있습니다.
* 문서에 적용된 폰트나 아이콘 등의 문서 형식에 대한 권리는 Nexon에 있으니 배포 시 유의하시기 바랍니다
주니어와 시니어 그 사이 - 중니어
문제 인식부터 해결책까지 컴포트존을 벗어날 고민을 공유합니다.
제주 '어쩌다밋업'에서 발표했습니다.
https://www.facebook.com/permalink.php?id=1512494142177634&story_fbid=3692166177543742
PGroonga – Make PostgreSQL fast full text search platform for all languages!Kouhei Sutou
PostgreSQL has built-in full text search feature. But it supports only limited languages. For example, it doesn't support Japanese. pg_trgm bundled in PostgreSQL supports all languages including Japanese. But it has performance problems for large documents.
This talk describes about PGroonga that resolves these problems.
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsMapR Technologies
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets given by MapR Chief Data Engineer EMEA . Big Data User Group in Stuttgart 2013-05-16
우리는 지금 무엇을 하고있는지를 고민하나요? 아니면 무엇이 되어가고 있는지를 고민하나요? 네 맞습니다. 우리는 매년 무엇을 할지 고민합니다. 그런데 중요한것은 방향 즉 어디를 가고 있는지 입니다.
그래서 넷플릭스의 추천 시스템이 어디를 향해 가고 있는지를 살펴보고 추천시스템의 향해 가야할 Goal에 대하여 같이 이야기를 해보고자 합니다
NDC21_게임테스트자동화5년의기록_NCSOFT_김종원.pdfJongwon Kim
NDC 2021에서 발표하였던 '게임 테스트 자동화 5년의 기록'이라는 주제로 NCSOFT에서 5년 동안 진행했던 게임 테스트 자동화에 대한 내용을 정리한 회고 내용입니다.
영상은 https://youtu.be/ckqUzRyIPoA 에 올라와 있습니다.
* 문서에 적용된 폰트나 아이콘 등의 문서 형식에 대한 권리는 Nexon에 있으니 배포 시 유의하시기 바랍니다
주니어와 시니어 그 사이 - 중니어
문제 인식부터 해결책까지 컴포트존을 벗어날 고민을 공유합니다.
제주 '어쩌다밋업'에서 발표했습니다.
https://www.facebook.com/permalink.php?id=1512494142177634&story_fbid=3692166177543742
PGroonga – Make PostgreSQL fast full text search platform for all languages!Kouhei Sutou
PostgreSQL has built-in full text search feature. But it supports only limited languages. For example, it doesn't support Japanese. pg_trgm bundled in PostgreSQL supports all languages including Japanese. But it has performance problems for large documents.
This talk describes about PGroonga that resolves these problems.
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsMapR Technologies
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets given by MapR Chief Data Engineer EMEA . Big Data User Group in Stuttgart 2013-05-16
Arviointi ja palaute on Suomen Taidoliiton valmentaja- ja ohjaajakoulutusjärjestelmän II-tasolle kuuluva koulutus, jossa perehdytään arviointimenetelmiin ja palautteenannon psykologiaan teoriassa ja käytännössä.
Slides for sharing session with PPI Stockholm. The topic is about Distributed Computing, covering what it is, why it is important in our daily life and how we can utilize it in Indonesia.
The counting system for small animals in japaneseCheyanneStotlar
This is a power point for the counting system the Japanes use for counting small animals. In this case it is describing the counting system for small fish.
Architecting a Cloud-Scale Identity FabricArinto Murdopo
Original article can be found here:
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5719572&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F2%2F5731551%2F05719572.pdf%3Farnumber%3D5719572
Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing.
A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large datasets
across cluster of workstations. To handle massive scale data, Hadoop exploits the Hadoop Distributed File
System termed as HDFS. The HDFS similar to most distributed file systems share a familiar problem on data
sharing and availability among compute nodes, often which leads to decrease in performance. This paper is an
experimental evaluation of Hadoop's computing performance which is made by designing a rack aware cluster
that utilizes the Hadoop’s default block placement policy to improve data availability. Additionally, an adaptive
data replication scheme that relies on access count prediction using Langrange’s interpolation is adapted to fit
the scenario. To prove, experiments were conducted on a rack aware cluster setup which significantly reduced
the task completion time, but once the volume of the data being processed increases there is a considerable
cutback in computational speeds due to update cost. Further the threshold level for balance between the update
cost and replication factor is identified and presented graphically.
Query optimization in oodbms identifying subquery for query managementijdms
This paper is based on relatively newer approach for query optimization in object databases, which uses
query decomposition and cached query results to improve execution a query. Issues that are focused here is
fast retrieval and high reuse of cached queries, Decompose Query into Sub query, Decomposition of
complex queries into smaller for fast retrieval of result.
Here we try to address another open area of query caching like handling wider queries. By using some
parts of cached results helpful for answering other queries (wider Queries) and combining many cached
queries while producing the result.
Multiple experiments were performed to prove the productivity of this newer way of optimizing a query.
The limitation of this technique is that it’s useful especially in scenarios where data manipulation rate is
very low as compared to data retrieval rate.
Organizations adopt different databases for big data which is huge in volume and have different data models. Querying big data is challenging yet crucial for any business. The data warehouses traditionally built with On-line Transaction Processing (OLTP) centric technologies must be modernized to scale to the ever-growing demand of data. With rapid change in requirements it is important to have near real time response from the big data gathered so that business decisions needed to address new challenges can be made in a timely manner. The main focus of our research is to improve the performance of query execution for big data.
Organizations adopt different databases for big data which is huge in volume and have different data models. Querying big data is challenging yet crucial for any business. The data warehouses traditionally built with On-line Transaction Processing (OLTP) centric technologies must be modernized to scale to the ever-growing demand of data. With rapid change in requirements it is important to have near real time response from the big data gathered so that business decisions needed to address new challenges can be made in a timely manner. The main focus of our research is to improve the performance of query execution for big data.
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATIONcscpconf
Recently the incremental growth of the storage space and data is parallel. At any instant data may go beyond than storage capacity. A good RDBMS should try to reduce the redundancies as far as possible to maintain the consistencies and storage cost. Apart from that a huge database with replicated copies wastes essential spaces which can be utilized for other purposes. The first aim should be to apply some techniques of data deduplication in the field of RDBMS. It is obvious to check the accessing time complexity along with space complexity. Here different techniques of data de duplication approaches are discussed. Finally based on the drawback of those approaches a new approach involving row id, column id and domain-key constraint of RDBMS is theoretically illustrated. Though apparently this model seems to be very tedious and non-optimistic, but in reality for a large database with lot of tables containing lot of lengthy fields it can be proved that it reduces the space complexity vigorously with same accessing speed.
Project presentation for High Availability in YARN project. We propose to use MySQL Cluster (NDB) to tackle High Availability issue in YARN. We also developed benchmark framework to investigate whether MySQL Cluster (NDB) is better than Apache's proposed storage (ZooKeeper and HDFS)
Full project report will be uploaded after I finish it.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Francesca Gottschalk - How can education support child empowerment.pptx
Dremel Paper Review
1. Dremel - Interactive Analysis of
Web-Scale Dataset
Paper Review
Arinto Murdopo
November 8, 2012
1 Motivation
The main motivation of the paper is the inability of existing big data infrastructure
(MapReduce-BigTable and Hadoop) to perform fast ad-hoc explorations/queries into
web-scale data-sets. In this context, ad-hoc explorations/queries mean on-the-fly queries,
which are issued by users when they need it. The execution time of ad-hoc queries is
expected to be fast so that users can interactively explore the datasets.
Secondary motivation of the paper is the needs of more user-friendly query mechanism
to make data analytic in big data infrastructure easier and faster. Pig and Hive try to
solve this challenge by providing SQL-like query language in Hadoop, but they only solve
the usability issue and not the performance issue.
Solving these issues allows faster, more efficient and more effective data analytic in big
data infrastructure which implies higher productivity for data scientists and engineers
who are working in big data analytic. Hence, these issues are interesting and important
to be solved.
2 Contributions
The main contribution of Dremel is high-performance technique in processing web-scale
data-set for ad-hoc query usage. The mechanism solves these following challenges:
1. Inefficiency in storing data for ad-hoc query.
Ad-hoc queries most of the time do not need all the available field/column in
a table. Therefore the authors propose columnar data model that improve data
retrieval performance. It was novel solution because well-known big data processing
platforms (such as MapReduce on Hadoop) work at record-structure data model
at that time.
2. Performance and scalability challenge in processing the data.
1
2. Multi-level serving tree model allows high parallelism in processing the data. It
also allows optimization of query in each of the tree level. Query scheduling allows
prioritization of the execution. Both techniques solve performance challenge in
data processing.
Scalability challenge is also solved since we can easily add more nodes to process
more data in multi-level serving tree model. Separation of query scheduler from
root server is good because it decouples query scheduling responsibility with job
tracking (note that: job tracking is performed by root server). This scheduler
model implies higher scalability when the number of leaf servers is huge.
The minor contribution of this paper is the use of actual Google data set in the experi-
ment. This usage gives other researchers insight on the practical magnitude of web-scale
data-sets
3 Solutions
3.1 Columnar Data Model
As mentioned before, ad-hoc queries only need small subset of fields/columns in tables.
Record-structure data model introduces significant inefficiencies. The reason is in order
to retrieve the needed fields/columns, we need to read the whole record data, including
unnecessary fields/columns. To reduce these inefficiencies, columnar data model is intro-
duced. Columnar data model allows us to read only the needed fields/columns for ad hoc
queries. This model will reduce the data retrieval inefficiencies and increase the speed.
The authors also explain how to convert from record-structure data model to columnar
model and vice versa.
3.2 Multi-level Serving Tree
The authors use multi-level serving tree in columnar data processing. Typical multi-level
serving tree consists of a root server, several intermediate servers and many leaf servers.
There are two reasons behind multi-level serving tree:
1. Characteristics of ad hoc queries where the result set size is small or medium. The
overhead of processing these kinds of data in parallel is small.
2. High degree of parallelism to process small or medium-size data.
3.3 Query Dispatcher
The authors propose a mechanism to regulate resource allocation for each leaf servers and
the mechanism is handled by module called query dispatcher. Query dispatcher works
in slot(number of processing unit available for execution) unit. It allocates appropriate
number of tablets into their respective slot. It deals with stragglers by moving the tablets
from slow straggler slots into new slots.
2
3. 3.4 SQL-like query
People with SQL background can easily use Dremel to perform data analytics in web-
scale datasets. Unlike Pig and Hive, Dremel does not convert SQL-like queries into
MapReduce jobs, therefore Dremel should have faster execution time compared to Pig
and Hive.
4 Strong Points
1. Identification of the characteristics of ad-hoc queries data set.
The authors correctly identify the main characteristic of data set returned from
ad-hoc queries, which is: only small number of fields are used by ad hoc-queries.
This finding allows the authors to develop columnar data model and use multi-level
serving tree to process the columnar data model.
2. Fast and lossless conversion between nested record structure model and columnar
data model.
Although columnar data model has been used in other related works, the fast and
lossless conversion algorithm that the authors propose is novel and one of the key
contributions of Dremel.
3. Magnitude and variety of datasets for experiments.
The magnitude of datasets is huge and practical. These magnitude and variety
of data-sets increase the confirming power of Dremel solution and proof its high
performance.
5 Weak Points
1. Record-oriented data model can still outperform columnar data model. This is
the main shortcoming of Dremel, however credit must be given on the authors
since they do not hide this shortcoming and they provide some insight on this
shortcoming.
2. Performance analysis on the data model conversion is not discussed. They claim
that the conversion is fast, but they do not support this argument using experiment
data.
3. "Cold" setting usage in Local Disk experiment. In Local Disk experiment, the au-
thors mention that "all reported times are cold". Using cold setting in database
or storage benchmarking is not recommended because the data is highly biased
with disk access performance. When the database is "cold", query execution per-
formance in the start of the experiment will highly depend on the disk speed. In
the start of the experiment, most of the operations involve moving data from disk
to OS cache, and the execution performance will be dominated by disk access.
3