Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Spark from Notebook to Cloud Native Application with Rebecca Simmonds

311 views

Published on

Data engineering teams love Apache Spark because it’s powerful and easy to manage, but managing a shared resource for experimental analyses and queries is very different from developing production applications in contemporary cloud environments: the gap between understanding Spark and being able to deploy and manage it in production can be vast.

This session will cover a developer’s journey learning Spark and using it to develop a containerized, cloud native application with analysis and visualization components. More specifically, these topics will be covered:

• Exploratory analysis in a Jupyter notebook running against an ephemeral Spark cluster
• Using PySpark for loading and analyzing data from external data sources like PostgreSQL
• Transforming your notebook into a cloud-native application deploying your application in containers on Kubernetes
• PySpark API functionality that you didn’t know you needed.

So, whether you’re an application developer or a Spark expert this session is for you. If you’re a developer wanting to deploy a spark cluster into production, this session will help guide you through techniques to make this transition easier and quicker. However, if you’re an expert, then this talk should give you some insight into how application developers work and help you to coordinate with the development team.

Published in: Data & Analytics
  • Login to see the comments

  • Be the first to like this

Apache Spark from Notebook to Cloud Native Application with Rebecca Simmonds

  1. 1. #Py2SAIS Spark from Notebook to Cloud Native Application Rebecca Simmonds Senior Software Engineer rsimmond@redhat.com @becky_simmonds
  2. 2. #Py2SAIS To empower others with the tools and tips to go from prototype to production using Apache Spark Aim
  3. 3. #Py2SAIS Prototype
  4. 4. #Py2SAIS 1. Use case 2. Problem domain 3. Data set 4. Tools and techniques Requirements
  5. 5. #Py2SAIS Use Case +
  6. 6. #Py2SAIS Variety Country Points Region Tinta de Toro Spain 98 Toro Cabernet Sauvignon US 70 Napa Valley Macauley US 50 Knights Valley
  7. 7. #Py2SAIS ● Open-source web application ● Create and share live code examples ● Python code ● It empowers users with visualisation tools Jupyter Notebook
  8. 8. #Py2SAIS Spark Driver Process Executors
  9. 9. #Py2SAIS Demo
  10. 10. #Py2SAIS ● Easy to setup and get going ● Lots of visualisations to practise with ● Great method for proof of concept Conclusions
  11. 11. #Py2SAIS Production
  12. 12. #Py2SAIS 1. Cloud based for scale and portability 2. Tooling and techniques 3. Database/more robust store 4. Testing Next Steps
  13. 13. #Py2SAIS Applications that are: 1. designed to run in the cloud 2. scalable 3. modular 4. and resilient Cloud Native Applications
  14. 14. #Py2SAIS ● Allow you to package and isolate a runtime environment ● Easily portable to different environments ● Scalable ● Quick and easy to deploy Containers
  15. 15. #Py2SAIS Monolithic Architecture Microservices Architecture User Interface Business Logic Data Access Layer Database User Interface MicroserviceMicroserviceMicroservice Database Database Database
  16. 16. #Py2SAIS Kubernetes kubectl apiserver schedulerreplication controller kubelet pod pod pod proxy Node Node Node
  17. 17. #Py2SAIS An open source community working to empower intelligent applications on kubernetes Projects and tutorials to empower developers with machine learning techniques Radanaytics.io
  18. 18. #Py2SAIS Oshinko
  19. 19. #Py2SAIS Oshinko Deployment Oshinko source to image
  20. 20. #Py2SAIS Architecture Postgresql Spark Spark Spark Wine Map Application Load and Calculate Response Response Request Response Request Job Web Browser
  21. 21. #Py2SAIS Demo
  22. 22. #Py2SAIS # test command os::cmd::try_until_text # what to test 'oc new-app --template=oshinko-python-spark-build-dc -p APPLICATION_NAME=winemap -p GIT_URI=https://github.com/radanalyticsio/winemap.git # expected result 'Success'
  23. 23. #Py2SAIS ● Jupyter notebook for prototyping ● VISIT radanalytics.io ● Deploy your own cloud native applications @becky_simmonds rsimmond@redhat.com https://radanalytics.io/applications/wine-map Conclusion

×