요약
직무 | 머신러닝, 데이터 엔지니어 | |
경력 | 3 ~ 6년 | |
회사 규모 | 134명 | |
기간 | 상시 채용 | |
위치 | 서울 강남구 삼성동 141 |
기술 스택
Scala
Java
Python
Apache Hadoop
Apache Airflow
Apache Spark
|
업무 소개
WHAT YOU WILL DO
- Design distributed, high-volume ETL data pipelines that power SendBird analytics and machine learning products
- Build the production service using open-source technologies such as Airflow, Kafka, Spark, Elasticsearch and AWS cloud infrastructure such as EMR, Kinesis, Aurora, S3, Athena, Redshift
- Lead the development of analytics and machine learning products, services and tools in Python, Java, Scala
- Collaborate with other teams and work cross-functionally for data related product initiatives
자격 조건
WHO YOU ARE
- 2+ years of work experience in building ETL pipelines in production
- Working knowledge of message queuing, stream processing, and highly scalable data stores
- Fluency in several programming languages such as Python, Java, or Scala
- Strong analytic skills related to working with unstructured datasets
- Ability to find the optimal solution given resource constraints; understands under-engineering and over-engineering concepts
우대 사항
EXPERIENCE AND SKILLS
- Work experience in AWS data pipeline echo-system
- Work experience in building natural language processing products
- Familiar with Airflow, Spark and Hadoop
- Understanding of RDBMS, NoSQL and distributed databases