Connecting with the Spark Community
On Tuesday night, the Paxata Lab was packed with people who came from all over the Bay Area to participate in the Spark Workshop on the Peninsula Meetup of the SF Big Analytics group, the first in a four-part series. As long-time advocates of Spark (we built the Paxata platform on Spark and released to our customers with release 1.0.0 back in 2014), we were excited to host an event for the community! 40+ people joined the Meetup and stayed until well past 9:30pm learning how to program with Spark, eating pizza and writing code!
The workshop was led by the incredible Holden Karau. She is currently working on Apache Spark at IBM, speaks frequently at conferences around the world, and is a great advocate for the open-source community (not to mention a huge Hello Kitty fan!). Here is a taste of what the meetup was like in Holden’s words:
What is Spark?
“What is Spark? It’s a really great general purpose distributed system. It has a nice API, nicer than Map Reduce, and it has a good optimizer that allows me to think less.”
This meetup was part intro-to-Spark and part hands-on exercise with cheerful, helpful, and super smart TA’s Rachel Warren, Anya Bida, and Sara Asher from Alpine Data. 50 people learned about RDDs, the Spark Context, and dove into a word count example. As the instructors explained, word count examples are required for any intro-to-Big-Data-coding sessions (think Hive, Flink, Map-Reduce). Those are the rules!
Slides from Holden Karau Lighting Fast Cluster Computing with Python (and just a wee bit of Scala) are available here.
More from Holden during the meetup –
Comparing Spark to Map Reduce
“Resiliency is achieved in a different way in Spark than traditional MapReduce. In MapReduce, resiliency is achieved because I’m always writing to a whole bunch of disks. It’s a good strategy, but it’s slow.
Spark’s creators said that because node failure doesn’t happen that often, I don’t have to write everything to disk. If we lose a node, Spark just recomputes the data for that node.”
Holden and Rachel are also working on a follow up to the book Learning about Spark with a new book High Performance Spark.
Paxata will be hosting a Data Prepsters Meetup with Tableau and the TAM group at our offices in Redwood City on Wednesday, May 18th from 6pm-8:30pm. The topic is “Data Freedom – Tableau shares how to truly that your reality.” There will be sushi, data blending discussions and networking.
The next SF Big Analytics Meetup is May 3rd at the IBM Spark Technology Center. Part two in the “Big Data Toolbox” series will be at the Alpine Data Labs on May 17th.
DataRobot is the leader in Value-Driven AI – a unique and collaborative approach to AI that combines our open AI platform, deep AI expertise and broad use-case implementation to improve how customers run, grow and optimize their business. The DataRobot AI Platform is the only complete AI lifecycle platform that interoperates with your existing investments in data, applications and business processes, and can be deployed on-prem or in any cloud environment. DataRobot and our partners have a decade of world-class AI expertise collaborating with AI teams (data scientists, business and IT), removing common blockers and developing best practices to successfully navigate projects that result in faster time to value, increased revenue and reduced costs. DataRobot customers include 40% of the Fortune 50, 8 of top 10 US banks, 7 of the top 10 pharmaceutical companies, 7 of the top 10 telcos, 5 of top 10 global manufacturers.
We will contact you shortly
We’re almost there! These are the next steps:
- Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
- Click the confirmation link to approve your consent.
- Done! You have now opted to receive communications about DataRobot’s products and services.
Didn’t receive the email? Please make sure to check your spam or junk folders.
How AI Helps Address Customer and Employee ChurnJune 8, 2023· 4 min read
Optimizing Large Language Model Performance with ONNX on DataRobot MLOpsJune 1, 2023· 11 min read
Belong @ DataRobot: AAPI Heritage Month with the ACTnow! CommunityMay 25, 2023· 3 min read
Learn how AI can help businesses reduce customer and employee churn with granular insights and targeted intervention tactics. Explore DataRobot AI Platform.
Many companies are experiencing mounting pressure to have a generative AI strategy, but most are not equipped to meaningfully put generative AI to work. For AI leaders, there are deeper questions you need to ask as you consider your path with generative AI.