#bigdata #migration #aws #cloudcomputing
Organizations processing large amounts of data face problems such as excessive maintenance costs and administrative problems while struggling to provide resources, coping with uneven workloads on a big scale, and pursuing innovation. AWS offers a wide selection of flexible on-demand computing resources, robust and inexpensive permanent storage, and managed services that provide current, known environments for creating and operating big data applications.
You can find more in our blog post: https://lcloud.pl/en/big-data-migration-to-the-cloud/
Follow us on our social media channels:
LCloud Blog https://lcloud.pl/en/blog/
Facebook https://bit.ly/2tCqBJS
Twitter https://twitter.com/LCLOUD16
SlideShare https://www.slideshare.net/LCloud
LinkedIn https://bit.ly/2syaQCr
3. Amazon EMR is a service that allows
cost-effective and fast processing of large
amounts of data. It uses the Hadoop and
Spark frameworks based on Amazon EC2 and Amazon
S3. It allows for efficient processing of large amounts
of data in processes such as indexing, data mining,
machine learning or financial analysis.
4. Amazon S3 (Simple Storage Service) is a
fully manager extraction, transformation and
loading (ETL) service that makes it easier for
clients to prepare and load data for analysis. It also allows
you to configure, coordinate and monitor complex data
flows.
5. AWS Glue is a fully managed
extraction, transformation and
loading (ETL) service that makes it
easier for clients to prepare and load data for
analysis. It also allows you to configure,
coordinate and monitor complex data flows.
7. Open source software
Apache Hadoop is software for distributed
storage and processing of large data sets using
computer clusters.
Apache Spark is a software that is a
programming platform for distributed
computing.
▪ Hadoop is designed to efficiently support batch processing, while Spark is
designed to efficiently handle data in real-time.
▪ Hadoop is a high-latency computing structure that has no interactive mode,
while Spark gives low-latency computing and can process data interactively.
▪ Apache Spark is also a component of the Hadoop Ecosystem. Spark’s main
idea was to perform memory processing.
9. There are few approaches in cloud migration, but
these 3 allow you to make conscious decisions
about your architecture.
3 APPROACHES TO THE
MIGRATION PROCESS
10. It relies on redesigning the existing
infrastructure in such a way to make
full use of cloud computing. The
approach relies on the analysing the
existing architecture and the way it’s
being designed, which will allow to
provide benefits such as lower
memory and hardware costs, increase
operational flexibility to ensure
business benefits.
Re-architecting
11. It is an ideal solution when we need
more efficient infrastructure. By
transferring the workloads of the
existing environment, we can avoid
most of the changes that can occur
during re-architecting. A smaller
number of changes also reduces the
risk associated with unexpected work,
and thus your solution can come back
sooner or enter the market.
Lift and shift
12. It’s a combination of two previous
approaches. In this mode, the part
responsible for fast migration is
associated with lift and shift. Re-
architecting, in turn, supports the
possibilities of redesigning the needed
solutions. This approach allows a great
deal of flexibility, which allows you to
experiment with cloud solutions and
gain the necessary experience before
you permanently decide to move to
the cloud.
Hybrid
14. Knowing the migration possibilities to the cloud,
let’s move on to prototyping. When learning new
solutions, there is always a learning stage. And as
you know, practice is its best form. Prototyping
should be crucial when implementing new
services and products. Here is the scenario the
same as before – the cheaper option is to check the application at the
prototyping stage. There is a similar story with instance types. The worst
assumption is that the application running in the on-premise
environment will work the same way in the cloud environment. There are
many factors that affect this. It’s worth running applications with loads
that can occur in the real world in a test environment.
16. 1. Make a list of all potential assumptions and uncertainties
while remembering what may have the greatest impact
on the environment.
2. First, select and implement the most risky aspects of
migration.
3. Set your goals in advance and don’t be afraid to ask. The
answers will help in project verification or answer the
question of how a given solution works.
4. Always prototype under similar conditions in which you
want to operate. You can start with a smaller
environment or set of features and then use the scale.
17. 5. Iteration and Continuous Integration as the basis for creating
implementation tests. Using an automated environment and
scripts, you can run the test in several environments.
6. Ask the expert for verification to be able to check the test
configuration and environment. This will allow you to eliminate
errors and check if the results are not falsified.
7. Correctly running the tests will allow you to remove variables
that may be due to dependencies.
8. Document the test results and ask for verification to ensure
they are reliable.
18. 9. Don’t take all assumptions for granted! In the big data
area, too many factors affect performance, functionality
and cost.
10.Prototyping aims to verify the assumptions of the project
with a fairly high degree of certainty. In general, more
effort put into the prototype, taking into account many
factors, will give greater confidence that the project will
operate in a production environment.
11. And above all, don’t be afraid to seek help – from AWS
Authorized Partners, AWS Support and in documentation
19. Any questions?
We can help you!
Feel free to contact us
kontakt@lcloud.pl
www.lcloud.pl
Thank you for your time!
All source materials in the presentation have been appropriately marked.