The document discusses approaches for off-loading data, applications, and users from an existing enterprise data warehouse system to Cloudera Enterprise Data Hub (EDH). It recommends starting with a specific use case or small non-critical application to gain experience and work out any issues before expanding the migration effort gradually using partial or full off-loads. Care must be taken to keep existing and new systems in sync during the migration process and ensure dependencies are addressed. Automating the process as much as possible with tools like Sqoop can help but some steps may require manual effort to address data type mappings or connector options.
2. Agenda
• What does it mean
• Why
• Approaches
• Things to consider
• Questions
CONFIDENTIAL ‹#› - RESTRICTED
3. What does it mean..
data
applications
users
.. from existing system (enterprise data warehouses) to Cloudera Enterprise Data Hub
(EDH)
CONFIDENTIAL ‹#› - RESTRICTED
4. Why.. ..a number of reasons...
.. Cost
.. Flexibility – structured/un-structured
CONFIDENTIAL ‹#› - RESTRICTED
5. Approaches..
.. Specific
.. Use Case
.. Application
.. Partial
.. Full
CONFIDENTIAL ‹#› - RESTRICTED
6. Specific..
.. This is the way to start..
.. Pick a use case or small to medium non-critical
application
.. End-to-end
CONFIDENTIAL ‹#› - RESTRICTED
7. Why Specific..
.. Reveal ah-ha moments
.. Gain experience
.. Iron out support, operations,
admin, issues
.. In some cases, complete
switch may not be feasible,
still do end-to-end but feed
needed data back to old system
CONFIDENTIAL ‹#› - RESTRICTED
8. Partial..
.. Now that there is in-house experience and
expertise built, focus on extending the
migration
effort to other areas
.. Follow the same pattern, end-to-end
CONFIDENTIAL ‹#› - RESTRICTED
9. Full..
.. In some cases a full off-load may be feasible
.. But don’t fool yourself
.. Existing systems might have been there for
years
.. May have 100s of TB, hundreds of databases,
thousands of tables, views, stored procs,
scripts, macros, workflows, reports and dozens of
apps pointed to it..
.. This may entail finishing lots of partial offloads
staged, verified, and ready to go before a full
migration
CONFIDENTIAL ‹#› - RESTRICTED
10. Planning..
.. How to keep existing systems in sync
.. Feedback/keep-alive loop
..Processed data may need to be
pumped back and forth
.. Keeping ID’s in sync (deciding system of
record)
.. Impact on existing environment
.. While migrating existing data
.. While keeping old and new system in sync
.. Number of connections
CONFIDENTIAL ‹#› - RESTRICTED
11. Sqoop..
.. Will help significantly in both migrating data as well
schemas
.. Automate as much as possible
.. Give script a DB.. list of tables or ones to avoid and have
it take care of the rest
.. But will still involve manual touch points
.. Data types
.. Not all data types maybe supported
.. Mappings
.. Connectors – go through options properly
CONFIDENTIAL ‹#› - RESTRICTED
12. Key take ways..
.. Start with specific use case
.. Identify dependencies and keep alive processes
.. Avoid scope creep.. Oh no we need that dataset
too.
.. Engage developers, testers business owners
early
.. Could be complex but done properly could
result in
significant savings, flexibility and new
capabilities..
CONFIDENTIAL ‹#› - RESTRICTED