Grubhub has chosen to adopt the Spark Big Data computing framework to underpin it’s internal Grubhub Data Platform Spark was adopted very early by Silicon Valley FANG companies.. What features make Spark a great computing platform for both Analytical reporting and Machine Learning? Tips on how to install PySpark on a Mac OSX system so one can play wit PySpark without paying for a cloud cluster
Meet Simon! He doesn't have a technology background, but he wants to be a programmer. Should he go to a boot camp with 17k or read 29,900,000 options provided by Google when he searched "Learn Python". Or he can join ChiPy mentorship program. While all of these will work, I would like to make his journey bit more enjoyable by presenting a more natural, friendlier, and a more interactive way to learn programming concepts. In this talk, we will look at functions, for loops, list comprehensions, and generators in a way that is easy for people like Simon to understand and use.
Ever want to avoid installing Python packages with complex dependencies such as sklearn? Ever have permissions issues installing a package? Anaconda is the answer. This talk describes why use it and how to get it set up.
Machine learning and record linkage: Finding duplicates or matching data when you don't have primary keys is one of the biggest challenges in preparing data for data science. At DataMade we have built a python, open source machine learning library to help developers, and a product Dedupe.io to help everyone else. We describe the problem and how we use machine learning to scale to tens of millions of records.
Every developer (eventually) writes tests. Unit tests, Integration tests, End-to-end tests, Regression tests.. All of those tests are necessary but can become a nightmare when you need to refactor some code. I personally don't like the amount of time I spend to manually mock my dependencies / functions / objects. This talk is about a simple docker-compose / pytest / mitm setup which aims at speeding up the mocking process and the maintenance of those mocks when refactoring or when updating the interface of your services. Q&A: Many of you are dealing with this mocking process regularly so you can expect many comments / questions if you come to this talk :) Contact: Quentin Bayart, Software Engineer @ Nielsen qbayart@hawk.iit.edu https://github.com/QuentinBay A couple of days before the presentation, I will push my demo to my github so you should be able to find it there after the presentation.
While Pandas is one of the most well known Python libraries for working with array-like data, many users limit themselves to just two dimensions of data. This talk will walk through Pandas' MultiIndex DataFrames, which extend traditional DataFrames by enabling effective storage and manipulation of arbitrarily high dimension data in a 2-dimensional tabular structure. ((If that sentence doesn't make sense yet, don't worry - it should by the end of the tutorial.)) While the displayed version of a multiindexed DataFrame doesn't appear to be much more than a prettily-organized regular DataFrame, it's actually a pretty powerful structure if the data warrants its use. This talk is beginner friendly, and will start from the assumption of having never used Pandas, though some Pandas experience will aid understanding.
Everything in Python is an object and nothing is special. Python's built-in objects can be added, called, indexed, or with'd, and with a little magic, so can yours! Use of magic methods, those prefixed/suffixed with double underscores, can increase the flexibility of your code while also making it shorter and simpler.
Walkthrough of `python-ls`, a new utility that allows users to interactively introspect Python objects.